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(57) Abstract 

The system comprises an input means (57) such as a keyboard for specifying (entering) selected amino acid sequences and 
other data such as temperature and fold preferences, a RAM (random access memory) (59) for storing such data, a ROM (read- 
only memory) (61) with a stored program, a CRT (cathode ray tube) (63) display unit and/or printer (65) an optional auxiliary 
disc storage device (67) for storage of relevant data bases, and a microprocessor (69) for processing the entered data, for simulat- 
ing, under control of the stored program, the folding of the protein from its unfolded state to its folded (tertiary) state, and for 
displaying via the display unit (or printer) tertiary conformations of the protein in three dimensions. 
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SYSTEM AND METHOD FOR DETERMINING 
THREE-DIMENSIONAL STRUCTURES OF PROTEINS 



Background of the Invention 

This invention relates to modeling systems 
generally, and particularly to computer-based 
simulation systems used in determining three- 
dimensional structures (tertiary native 
conformations) of globular protein molecules. 

The value of determining structure or 
conformation of proteins is well known. For example , 
in 1961 a Nobel Prize was awarded to Max Perutz for 
his work in determining the structure of the 
hemoglobin protein in blood. From this discovery, we 
now understand more about sickle cell hemoglobin and 
how drugs can be designed to treat patients with this 
disorder. 

The prediction of antigenic determinants 
also is based on the prediction of protein tertiary 
structure. One such scientific work is reported, for 
example, by Hopp and Woods in "Prediction of protein 
antigenic determinants from amino acid sequences", 
Proceedings of the National Academy of Science USA 
78, pp. 3824-3828 (1981), and in "A Computer Program 
for Predicting Protein Antigenic Determinants", 
Molecular Immunology Vol. 20, No. 4, pp. 483-489 
(1983) . 

The structure (native conformation) of the 
protein, particularly the conformation of the outer 
sites or sidechains (which are linked to the backbone 
and inner structures of the protein) often determines 
the capacity of the protein to interact with other 
proteins. One factor which directly influences 
conformation is protein folding. Deciphering the 



rules through which the building blocks (amino acid 
sequences) of the protein affect folding promises 
significant improvements in the design of proteins, 
many with a host of new catalytic functions useful, 
for example, in the chemical, food processing, 
pharmaceutical, and other industries. 

As a tool, computer systems are sometimes 
used to combine and display protein structures. One 
such system, used to convert two polypeptide chains 
to a single polypeptide chain, is described for 
example in U.S. Patent No. 4,704,692, entitled 
"Computer Based System and Method for Determining and 
Displaying Possible Chemical Structures for 
Converting Double- or Multiple-Chain Polypeptides to 
Single-Chain Polypeptides*, issued November 3, 1987 
to inventor Robert C. Ladner. Computer systems have 
also been used to investigate protein structures and 
predict protein folding. A few of such uses have 
been reported in Protein Folding by K. Go et al., pp. 
167-81, ed. by N. Jaenicke, Amsterdam, Holland 
(1980); Biopolymers by S. Miyazawa et al., 21:1333- 
63, (1982); and Journal of Molecular Biology, by M. 
Levitt, 104:59-107 (1976). 

These systems often (a) cannot process a 
full sequence of amino acid residues of a protein or 
protein segment (i.e., cannot process or otherwise 
represent the interactions of all the residues of the 
protein or protein segment; this task often becomes 
intractable, the system generally becomes unduly 
burdened by the many degrees of freedom of the 
residues) , (b) cannot complete the folding process 
(because of inability of the system to recognize 
false, local energy - minima conditions) , (c) cannot 
represent tertiary conformations in three dimensions, 
(d) cannot represent interactions between sidechains, 



(e) do not display the pathway taken by a protein in 
folding, or (f) do not permit free (unconstrained) 
interactions between residues for more realistic 
simulation of real proteins. 

What is needed and would be useful, 
therefore, is a computer-based system that would 
eliminate the above-mentioned deficiencies, and 
provide a faster way of determining protein 
structures, thereby increasing the productivity of 
many scientists and encouraging the undertaking of 
many more needed investigations, including 
investigation of structures of protein sequences 
obtained from mapping of the human genome. 

Summary of the Invention 

Accordingly, an improved computer-based 
system is provided that is capable of processing a 
full sequence of amino acid residues of a protein 
(e.g., plastocyanin) , representing free 
(unconstrained) interactions between residues and 
between sidechains, tracking an entire folding 
operation (pathway) from the protein's unfolded 
(denatured) state to its fully folded (native) state, 
and displaying tertiary conformations of the protein 
in three dimensions. 

The system comprises an input means such as 
a keyboard for specifying (entering) selected amino 
acid sequences and other data such as temperature and 
fold preferences, a RAM (random access memory) for 
storing such data, a ROM (read-only memory) with a 
stored program, a CRT (cathode ray tube) display unit 
and/or printer, an optional auxiliary disc storage 
device for storage of relevant data bases, and a 
microprocessor for processing the entered data, for 
simulating, under control of the stored program, the ' 
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folding of the protein from its unfolded state to its 
folded (tertiary) state, and for displaying via the 
display unit (or printer) tertiary conformations of 
the protein in three dimensions. 
5 A novel lattice is employed for representing 

(framing) the various conformations of the protein as 
it folds from an unfolded sequence of amino acid 
residues to a tertiary structure. The model 
comprises a cubic arrangement of 2 4 -nearest-neighbor 

10 lattice sites, with adjacent sites located a unit 

distance from each other, and adjacent a-carbons 
located a distance of Js units from each other. The 
a-carbons represent a chain or backbone of the 
protein. Each a-^carbon is shown to occupy a central 

15 cubic lattice side plus six adjacent cubic lattice 

sites defining a surface of interaction (e.g., an 
area or volume having a surface of finite size) . 
Each sidechain is represented as being embedded in 
the lattice and occupying a selected number (four) of 

20 lattice sites located relative to the central site, 

the number of sites occupied by the sidechain being 
proportional to the number of sites defining the 
surface of interaction. 

In response to specification of temperature 

25 and the amino acid sequence of the protein, the 

system determines the tertiary conformation of the 
protein using Monte Carlo dynamics with an asymmetric 
Metropolis sampling criterion. The system, (a) 
generates a three-dimensional representation of an 

30 unfolded conformation consisting of an a-carbon 

backbone and sidechains, (b) produces (in accordance 
with local conformational preferences of the 
residues, and the lowest total energy of interactions 
between close sidechain pairs which satisfies the 

35 criterion) successive likely conformations at the 



temperature, according to the total energy of each 
conformation, (c) selects from the successive likely 
conformations the lowest total -free-energy tertiary 
conformation which satisfies said criterion , and (d) 
determines the coordinates of the selected tertiary 
conformation for display. In producing successive 
likely conformations, the system modifies each 
conformation by moving randomly selected residues 
(beads) and inter-residue bond vectors to different 
selected lattice sites by performing various type 
moves (single-bead jump-type moves, two-bead end-flip 
moves, chain-rotation type moves, and translation 
wave-type moves) . 

By the method employed by this system, 
simulation of protein folding and prediction of 
tertiary structure are not only performed with 
greater success and accomplished faster than by many 
existing methods, but the simulation itself becomes 
more manageable (tractable) . 

Brief Description of the Drawings 

Fia. 1 is a diagramatic illustration of a 
globular protein in its native folded conformation. 

Fig/ 2 is a diagramatic illustration of a 
full sequence of amino acid residues of which the 
protein represented in Fig. 1 is comprised. 

Fig. 3 is a block diagram of the system of 
the present invention. 

Fig. 4 is a block diagram showing a 
perspective view of a cubic lattice model employed in 
the system of Fig. 3. 

Fig. 5 is a block diagram showing a segment 
of a protein model comprising an a-carbon and 
sidechain in a cubic lattice of the type shown in 
Fig. 4. 
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Fig. 6 is a diagramatic illustration of an 
a-carbon backbone of a protein segment. 

Fig. 7 is a diagramatic illustration of an 
cr-carbon of the protein backbone segment shown in 
5 Fig. 6. 

Figs. 8A-8C are diagramatic illustrations of 
selected simple arrangements of an a-carbon backbone 
and associated sidechains. 

Fig. 9 is a diagramatic illustration of a 
10 jump-type move made by a randomly selected residue 

(bead) within the lattice of Fig. 4, effecting a 
change in conformation of the protein model. 

Fig. 10 is a diagramatic illustration of a 
rotation-type move made by a pair of randomly 
15 selected bond vectors within the lattice of Fig. 4, 

effecting a change in conformation of the protein 
model. 

Fig. 11 is a diagramatic illustration of a 
translation-type (wave-type) move made by a U-shaped 
20 segment within the lattice of Fig. 4, effecting a 

change in conformation of the protein model. 

Figs. 12A-12D are diagrammatic illustrations 
of the folding of a selected segment of a protein to 
a /? -barrel conformation. 
25 Figs. 13A-13C are graphs showing an average 

number of native contact pairs between sidechains 
versus time. 

Figs. 14A and 14B are graphical 
illustrations of a folding pathway defined by a 
30 sequence as it folds from an unfolded state to a 

folded ( native ) state . 

Figs. 15A-15F are block diagrams (flow 
charts) showing a method employed by the system of 
Fig. 3 in simulating protein folding. 
35 Fig. 16 is a block diagram showing an 
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alternate embodiment of the processor of Fig. 15. 

Detailed Description of the Invention 

A simplified representation of a globular 
5 protein (e.g., plastocyanin) in its native (natural, 

folded) form is shown in Figure 1. A simplified 
representation of a full sequence of amino acid 
residues of which the protein is comprised is shown 
in Figure 2. The protein becomes unfolded 

10 (denatured) when it is heated to an elevated 

temperature, and it refolds to its native form when 
the temperature is lowered to a selected level. 
Temperature may be specified in any unit (whether 
fahrenheit, centigrade, or Kelvin) and at any level 

15 or value (whether in or outside the transition range 

of the protein) as explained hereinafter. Generally, 
depending on the native biological conditions of the 
particular protein molecule being investigated, the 
temperatures that are specified are those in and 

20 bordering the transition region of the protein 

(typically, in and above 35*C-45'C). 

Given a sequence of amino acid residues of a 
known or unknown protein, it would be useful, for 
example in the designing of a drug, to know to what 

25 protein form (structure, conformation) the sequence 

would fold if selected residues were changed 
(modified) . 

To determine the probable tertiary structure 
(three-dimensional conformation) to which a given 

30 sequence or modified sequence would fold, a 

simulation of the folding operation could be 
performed on a computer system of the type shown in 
Fig. 3. The system uses a "210* lattice model, as 
shown in Fig. 4. The system is described in detail 

35 hereinafter. Prior to description of the system, 
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however, to facilitate understanding of the 
invention, other aspects of the invention (such as 
lattice arrangement, types of movement of segments 
(residues) of protein within the lattice, 
5 orientational states of a segment, and inter-residue 

interaction) are described below. 

Lattice Model, and Positioning of Protein 
Conformation 

10 Referring now to Fig. 5, a section or 

segment 11 of a full sequence (e.g., a sequence of a 
protein much like that depicted in Fig. 2) is shown 
in stick form (without associated residues or atomic 
structure) . The section 11 includes an a-carbon 

15 segment 13 and a sidechain (0-carbon) segment 15 

representative of each amino acid residue of the 
protein. 

The protein segments may be viewed as 
embodied within a cubic reference framework or 

20 lattice model (Fig. 4) , constructed from vectors of 

the type (1,0,0), (0,1,0), (0,0,1), the distance 
between any two adjacent points being unity. The a- 
carbon atoms 13 when linked as shown in Fig. 6 form 
the backbone 14 of the protein. As shown in Figs. 4 

25 and 7, each a-carbon 13 may be viewed as occupying a 

central cubic site 17 plus six adjacent cubic sites 
18-23, defining a finite surface of interaction. 
Adjacent a-carbon centers may be viewed as linked by 
a 210-type lattice vector 25, as shown in Fig. 4. 

30 The backbone 14 (Fig. 6) represents a 

structure of finite thickness about which a somewhat 
inflexible, hard core envelope of a chain of residues 
develop. The conformation of the backbone at the X th 
a-carbon is specified in terms of r 2 G1 , the square of 

35 the distance between adjacent a-carbons (i-1 and 



a-carbons, and 9 repr sents a bond angle that one of 
the a-carbons make with respect to the other, as 
shown in Fig. 6. In model units, the distance 
between consecutive a-carboris equals J5 units. 
Selected values of r 2 e are 6, 8, 10, 12, 14, 16, and 
18, expressed in model units, indicating various 
accessible bond angle states. These values represent 
internal orientational states corresponding to actual 
(known) physical conformations. 

As shown in Figs. 5 and 8, each a-carbon has 
attached to it a sidechain 15, constructed for 
example in a helix conformation 27 , or in a ^-strand 
conformation 29. From the central vertex portion 31 
of the a-carbon, the sidechain 15 is formed, 
comprising four lattice vector points (1,1,0), 
(1,1,0), (1,1,0), and (1,1,1) 33. Three points 
represent fcc-type (face center cubic) lattice 
vectors, i.e., vectors of the type (±1, ±1, 0). The 
fourth point represents a diamond lattice vector of 
the type (±1, ±1, ±1). This latter vector serves as 
the center of hydrophobic or hydrophilic interactions 
(explained hereinafter) . The orientation of the 
sidechain depends on the backbone conformation, i.e., 
depends on r 2 e . At least two of the three fee 
vectors comprising the sidechain are shown in an L- 
conformation (i.e., with left-handed chirality) . The 
diamond lattice-type vector is always shown in the L- 
conformation* (For a more detailed description of 
lattice rules which should be followed when 
constructing conformations, refer to Appendix A.) 
For the calculations described hereinafter, either 
the residues are glycine, in which case there is no 
sidechain, or the residues have a sidechain of 
uniform size. 



10 



Interactions Between Residues 

The following is a description of how the 
210 lattice model (Fig. 4) is used to denote 
interactions between elements (residues) of a given 
backbone conformation, and to denote the energy of 
such interactions. To specify the conformation of 
the backbone of a chain, composed of n residues on an 
a-carbon representation, n-2 bond angles (e) and n-3 
torsional angles (£) must be specified. To determine 
the conformation of the first and last residues, a 
virtual residue is appended to each end of the chain. 
These virtual residues are represented as inert. 
They occupy space but are devoid of sidechains. 
Thus, with the addition of the two fictitious 
(virtual) residues, n bond angles and n-1 torsional 
angles can now be used to specify the backbone 
conformation of the chain. (For convenience in 
denoting segments, the residues of the chain may be 
numbered from 1 to n.) 

With respect to expressing (representing) a 
preference for a given conformation, any intrinsic 
preference of the protein model for a particular 
conformation may be represented by the individual 
preferences of the respective residues for the 
various bond angle states. In the description that 
follows, the term local conformational preferences 
shall mean the relative preferences which each local 
group of residues (i.e., a selected residue plus two 
flanking (adjacent) residues on either side of the 
selected residue) exhibit for the different 
conformational states*. As indicated previously, 
these states are represented by the value r 2 e of the 
lattice model. Since for every residue i there are 
seven distinct values of r%, corresponding to 18 
distinct local conformational states, the local 
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energetic preference (denoted as parameter € 8 (r 2 ei )) 
for each of the states (r% values) must be 
specified. If it is desired to reduce the number of 
such adjustable parameters (that is, parameters 
requiring specification) , the conformations (except 
conformations where e e (r 2 ei )-0) may be made 
isoenergetic and assigned the value e e > 0. 

In addition to bond angle, the torsional 
(dihedral angle) potential of a residue (i.e., its 
tendency to undergo an angle of rotation or twist) 
must be specified. The torsional potential 
associated with the I th residue is specified in terms 
of residues (i-1) through (i+2). Actually, a 
dihedral angle potential must be specified in the 
model for all residues from residue 2 (corresponding 
to real residue 1) to residue n-2 (corresponding to 
real residue n-1) . Because the model is confined to 
a lattice, it is convenient to describe the torsional 
potential associated with the 1 th residue in terms, 
of: (a) r 2 e , r 2 9l+w the bond angle states i and i+1, 
(b) r%, the square of the distance between a-carbons 
i-1 and i+2, and (c) the handedness of the dihedral 
angle, X « +1 for right-handed chirality (R) or 
X = -1 for left-handed chirality (L) . For example, a 
planar state having <p = 0 is specified by (16, 16, 
37, -l). That is, the square of the distance between 
a-carbons i-1 and i+1 is 16, between a-carbons i and 
i+2 is 16, and between a-carbons i-1 and i+2 is 37. 
(For definiteness in the calculation, a dihedral 
angle of 0 is taken to be left-handed. This 
conformation could also be specified by the vectors 
hi, b 1+1 , b l+2 as shown in Figure 8). As many as 324 
rotational states exist for each internal bond. 
These rotational states are all assigned a relative 
energy value (r 2 ei , r 2 M , r% t X) . Generally, all 
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such rotational states are statistically weighted. 
Where the majority of the conformations are taken to 
be isoenergetic (with a small bias toward a small 
subset of conformations that are native) , the short 
5 and intermediate range energetic preferences may be 

represented as e(r 2 9(l , r 2 ei+1 , r% t ) . 

The seven lattice sites that define the 
a-carbon (Fig. 7) and the four lattice sites (Fig. 5) 
that define the surface 24 of the sidechain interact 
10 repulsively (i.e., with strong, hard core repulsion) 

with all the other a-carbons and their respective 
sidechains. In other words, no more than one 
sidechain or a-carbon can simultaneously occupy a 
given lattice site. (This is generally referred to 
15 as the excluded volume criterion.) Such a model may 

be viewed as having a backbone of finite thickness. 
In addition to the hard core repulsion, described 
above, there is a weak (soft core) repulsive 
interaction between non-bonded a-carbon backbone . 
20 centers located within a distance of J5 model units 

of each other. If represents the distance 
between the k th and 1 th such centers, then the soft 
core repulsive energy e rep between the pair may be 
expressed as: 

25 r "1 

co ; r 2 ja = 0,1,2,4 
c„p = e rep ; r 2 ^ = 3 
3c rtp ; r\ t = 5 

0 ; otherwise _ 
30 (€ rep typically takes on the value of 6 in the 

calculations that follow.) 

Following description of the lattice, bond 
angle, bond angle states, and torsional angles, a 
description of tertiary interactions between the 
35 residues in a three-dimensional setting is presented 
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next. To represent the effect of hydrogen bonding 
and dipolar-type interactions, a cooperative 
interaction energy parameter E c is introduced which 
allows for secondary structure stabilization when any 
5 part of the a-carbon hard core envelope of the l z 

residue is at a distance of 3 units from the a-carbon 
center of the k th residue. 

If a pseudodot product between two vectors 
is defined as: 
10 dot(b k , b x ) = 1 ; if b k = ±h x 

0 ; otherwise 
then, the cooperative interaction energy £ ckL may be 
given by: 

e*i - c c ^ot (b^bj + dot (b k+1 ,b L ) + 

\dot (b k ,b l+1 ) + dot (b k+x ,b 1+1 V 

where, e c represents an energetic preference 
parameter which is applied, uniformly, to all residue 
20 pairs independent of their conformation. 

Sidechain Interactions 

In the preceding section, the subject of 
interactions relating to backbone conformation was 

25 discussed. In the following section, the subject of 

interactions between sidechains is discussed. 
Sidechain interactions are treated as being 
independent of backbone conformation. Interactions 
between any pair of side chains is allowed if the 

30 interacting sidechain sites lie at a distance of J2 

from each other. Sidechains may be hydrophobic, 
hydrophilic or inert. Pairs of hydrophobic 
sidechains interact with an attractive potential of 
mean force; hydrophobic/hydrophilic pairs interact 

35 with a repulsive potential of mean force; and 

hydrophilic pairs interact weakly (i.e., weakly 
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attractive or repulsive with no change in quality to 
behavior) . 

With respect to the calculation of 
sidechain-sidechain interaction energy, the following 
5 rules (scales) were employed in one calculation: 

glycines were assumed to lack sidechains and were 
assigned a hydrophobicity index h(i) = 0. 
Hydrophobic residues were assigned a negative 
hydrophobicity index h(i) < 0, and hydrophilic 

10 residues were assigned a positive hydrophobicity 

index h(i) > 0. For all sidechains that were greater 
than two residues apart down the chain, the 
sidechain-sidechain interaction matrix am(i,j), 
representing the interaction energy between the i th 

15 and the j ch pair of sidechains, was given in the 

form: 

am(i,j) = -h(i) • h(j) . e 
where e = c phob ..p hobe > 0, if h(i) and h(j) were both 
negative (that is, if both were hydrophobic) . 

20 e - e phob ._ phU > 0 if one residue is hydrophobic and the 

other hydrophilic, and e « -e phl i- P hn/ (with e phtl . phil > 
0), if both h(i) and h(j) are positive, that is, if 
both sidechains are hydrophilic. The subscripts 
phobe-phobe mentioned above represent interaction 

25 between two hydrophobic residues, phobe-phil 

represents interaction between a hydrophobic residue 
and a hydrophilic residue, and phil-phil represents 
interaction between two hydrophilic residues. (As 
indicated above and in the program listing shown in 

30 Appendix D, tertiary interactions between any 

spatially close pair of sidechains are implemented 
using a modified Miyazawa-Jernigan (MJ) 
hydrophobicity scale. Based on the frequency of 
occurrence of contacts between sidechain pairs in 

35 protein crystal structures, the MJ scale is used to 



WO 91/16683 



PCT/US91/02786 



15 



determin effective inter-residue c ntact energies.) 

As used below, short-range interactions 
shall mean interactions between adjacent residues in 
the chain and does not include effects of their 
5 neighbors (i.e., neighboring residues in the chain). 

Medium-range interactions shall mean interactions 
between first, second, and third nearest-neighbor 
residue groups in the chain. Long-range interactions 
shall mean interactions between residues (not a- 

10 carbons) which are positioned greater than three 

nearest neighbors apart down the chain but which are 
spatially close (i.e., within 3*A of each other). 

Both native and non-native interactions are 
allowed between non-bonded pairs of residues that are 

15 specially close enough to interact. No criterion or 

constraint is imposed to drive the simulation towards 
any predetermined native conformation. Based on long 
or short interactions, a native conformation may 
comprise one of a number of isoenergetic states. It 

20 is the juxtaposition of short-medium-and-long-range 

interactions, together with other factors described 
herein that produce the final result, namely a 
stable, folded conformation. 

As described hereinafter, all of the 

25 energetic parameters, e 9t e^, €„ Pf e phobe . phob€ , € p hobe-phii, 

€ phii- P hii are uniformly scaled by a reduced temperature 
factor, T. 

With respect to specifying other 
characteristics of the primary sequence of amino acid 

30 residues, the following conventions are used. In a 

simplified model, the term BJk) is used to represent 
the 1 th stretch of k residues in the sequence. The k 
residues are represented as having identical £ e and 
e<p' values and a marginal (short and intermediate 

35 range) preference for /3-state conformation. 



16 



range) preference for 0-state conformation. 
Consistent with 0-sheet formation, B x (k) also 
represents an alternating odd/even pattern of 
hydrophilic and hydrophobic residues. 

Where a sequence of k residues are locally 
indifferent to whether they are in an a-helix or in a 
0-sheet, the term AB t (k) may be used to denote the 
i th -stretch in the amino acid sequence containing k 
residues in an alternating hydrophobic/hydrophilic 
pattern, such that e B (l2) = e B (16) for all k 
residues. Where a sequence of k residues has an 
alternating hydrophobic/hydrophilic pattern and 
locally prefers or-helical state conformation, such 
that e e (12) = 0 and e 8 (16) > 0, this is denoted by 
the shorthand notation A t (k). 

Putative band regions are denoted by b^j), 
and consist of j residues located at the interface 
between putative 0-stretches i and i+l. 

Chain Dynamics. Modification of Conformations 

The dynamics of the chain are simulated by a 
(pseudo) random sequence of conformational 
rearrangements (moves) (i) through (iv) described 
below. In all such moves, the bead (amino acid 
residue) on which the move is performed is chosen at 
random. 

(i) Examples of single bead jumps (also 
referred to as flips, spike or kink moves) are shown 
in Figure 9. Also, a representative set of single- 
bead modifications is listed in Table I* These moves 
are constructed by conserving the vector b t . x + b t 
(i.e., not changing the magnitude nor direction of 
imaginary vector (b^+bj ) . The moves are made in a 
manner which maintains the bond angle associated with 
the i th residue but changes the bond angles of the i- 
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five distinct possible outcomes (associated with r 2 ei 
=12), each of the moves is coded with five outcomes, 
some of which are degenerate (i.e., their 
conformations, each has the same energy). A clock is 
5 used to sequentially choose the particular outcome. 

New conformations of jumps (kinks) are also generated 
at random. . After a move has been selected, it is 
only accepted if the adjacent bond angles are allowed 
(i.e., r 2 ei +i* and r 2 e -i must lie in the range 6-18). 

10 If the move satisfies these local geometric 

constraints, then the sites (seven backbone sites 
plus four sidechain sites) into which the bead will 
jump are checked to insure that they are unoccupied. 
Otherwise, the move is rejected (not made) . 

15 A list of sample single-bead, modified 

vector values is presented in Table I. 
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TABLE 1 

Sample Single Bead Modification Data 



CONFORMATION 


EXAMPLE SEQUENCE 




POSSIBLE 




r'„ 


Ur XNITIAIj VECTORS 




MODIFICATIONS 


2 


(excluded) 






4 


(excluded) 








6 


(2,-1,0) (0,2,1) 


a. 


(0,2,1), (210) 








b. 


(2,0,-1), (0,1,2) 








c. 


(0,1,2), (2,0,-1) 


8 




(1,2,0), (-1,0,2) 


a. 


(-1,0,2), (1,2,0) 








b. 


(1,0,2) . (-1.2 . 0) 








c. 


(-1,2,0) , (1,0,2) 


10 




(1,2,0) (2,-1,0) 


a. 


(2,-1,0), (1,2,0) 


12 




(1,2,0) , (1.0.2) 


a • 


\X t U , 4) r (X, £ , UJ 








b. 


(2,1,0), (0,1,2) 








c. 


(0,1,2), (2,1,0) 








d. 


(2,0,1), (0,2,1) 








e. 


(0,2,1), (2,0,1) 


14 




(2,-1,0), (0,-2,1) 


a. 


(0,-2,1), (2,1,0) 


16 




(1,2,0), (-1,2,0) 


a. 


(-1,2,0), (1,2,0) 








b. 


(0,2,1), (0,2,-1) 








c. 


(0,2,-1), (0,2,1) 
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(-1,2,0), (0,2,1) 


a. 


(0,2,1), (-1,2,0) 






or 








(-2,1,0), (-1,2,0) 


a. 


(-1,2,0) , (-2,1,0) 



20 (excluded) 
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(ii) With r spect to two-bead end flips (in 
which the two end bonds are transformed to a new set 
of vectors) , the set of two vectors is chosen at 
random from the twenty-four possible orientations of 
5 the lattice vectors. In this case, the two new end 

bond vectors must satisfy the allowed local bond 
angle criteria. If they do not, the move is be 
rejected. Further, the two end residues in their new 
conformation must not violate excluded volume 
10 constraints. 

The above-mentioned moves (i) and (ii) 
satisfy the correct dynamics for the athermal random 
coil state in the absence of hydrodynamic 
interactions. 

15 (iii) Turning now to chain rotations, an 

example of this type of move is shown in Fig. 10. The 
minimum size unit selected for rotation consists of 
three beads, and the maximum size unit is 2+wave. 
(The value of the parameter "wave" is generally 4, it 

2 0 is chosen so that the size of the unit undergoing the 

rotation is the size of a mean element of secondary 
structure.) The particular size of the unit (6+1) 
undergoing the attempted rotation is chosen by the 
value of an external clock parameter, and 

25 sequentially varies from the minimum to maximum size. 

A particular bead I, at one end of the rotating unit, 
is chosen at random. For beads less than n/2, the 
unit undergoing the rotation is 1-5. . For beads 
greater than n/2, the unit undergoing rotation is 

30 1+5. If ib represents the first residue at the 

beginning of the rotating unit, and iend represents 
the residue at the end of the rotating unit, then if 
the bond angle state between the vectors b lb and b^^-x 
is a 14-18 state, the rotation is attempted. (The 

35 range of values of r 2 Bi is chosen so that the rotation 
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is physically possible.) The rotation is implemented 
by interchanging the two bond vectors (e.g., vectors 
35, 37 joining randomly selected bead 39 shown in 
Fig. 10). The initial set of bond vectors joining 
5 residues ib to iend is (b lb , b lb+1 , ... b l<nd . 2f b^.J . 

The final set of bond vectors is (b Und _ lf b lb+I , ... 
b i«d-2/ b ib) • The new conformation is checked to 
insure that it can join the remainder of the chain 
without violating bond angle restrictions and 

10 excluded volume restrictions. 

(iv) Internal wave-like motions such as are 
shown in Figure 11 are also performed. These moves 
serve to propagate defects down the subchain by 
deleting a defect at one end of the subchain and 

15 creating the defect at the other end of the subchain. 

The defect propagation procedure is performed by the 
system as follows. I denotes a bead chosen at 
random. The system first determines if a U-shaped 
defect exists (i.e., does b x = -b I+3 ?) . If not, 

20 attempt at wave-like motion is abandoned. If a 

defect exists., the system then picks a place where 
the defect should be inserted. The chosen point is 
at JJ « 1+2 ± (5+<S) , with 6 varying between 0 and 
wave-1. About half of the time, the defect insertion 

25 point lies to the left of I, and the other half the 

time it lies to the right of it. As mentioned 
before, typically, the value 4 is selected for wave. 
As shown in Fig. 11, the bond vectors b z 41 and b I+3 
43 are then sliced out of the chain, thereby deleting 

30 two beads 43 and 47, provided that b I+I 49 and b I+2 51, 

which will form the new bond angle state or vertex 
1+1 53, satisfy the local geometric constraints of 
the chain. Next, two bonds 49, 51 are inserted into 
the chain. If the original vectors associated with 

35 beads JJ-1 and JJ are bj^ and bjj, the new set of 
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four vectors are (v, b^, b j: , -v) , where the vector 
v is chosen at random. Note that the intervening 
bond factors between 1+4 and JJ-2 are left unchanged. 
A new conformation is then generated by renumbering 
5 the residues so that their identity is conserved. As 

before, both excluded volume and local bond angle 
criteria must be satisfied in order for the 
conformation to be accepted. 

After each of the elemental moves (i) - 
10 (iv) , described above, the energy of the new 

conformation, E new , is calculated and compared to the 
energy of the old conformation E oU . E n€V represents 
the sum of the individual energies, and is expressed 
as: 

15 E n€V « E 6 + E, + E c + E, 



where E e = 2 c e 
N 

20 

E$ = E tor 
E= Z e ckl 

25 and E, (also referred to as E lld J 

= l/2 l(J E am(i,j) 

(The term E old represents the initial total value, 
then successive previous total values with which E n 
3 0 is compared.) 

With respect to free energy (as distinct 
from total energy) , the system attempts to find a 
free energy minimum, given as: 

Free energy = Total energy - TS 
35 where T represents temperature, and S represents 

entropy. 

If E ntv is less than E old , then the 
conformation is accepted. Otherwise, a Metropolis 
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sampling criterion is applied (as described for 
example in Monte Carlo Methods in Statistical Physics 
2nd ed. by K. Binder, Springer-Verlag, Berlin, New 
York, 1986) . In which event, a random number R 
5 uniformly distributed between 0 and 1 is generated. 

If R is less than the probability P, where 

P= EXP K B T 

10 then the conformation is accepted; otherwise, it is 

rejected. Here, k B represents Boltzmann's constant 
and T represents the absolute temperature of the 
protein. Thus, a standard asymmetric Metropolis 
sampling scheme (criterion) is employed. As 

15 described below, the sampling scheme or criterion is 

applied in conjunction with a dynamic Monte Carlo 
technique (as described for example in Mon te Carlo 
Methods in Statistical Physics by K. Binder, cited 
above) . 

20 A single Monte Carlo dynamics time step 

consists of N attempts at move type (i) (jump-type 
move) mentioned above, two attempts at move type (ii) 
where each of the chain ends are subjected to move 
type (ii) , one attempt at move type (iii) , and one 

25 attempt at move type (iv) . In the simulation, the 

protein model is started out in a randomly generated 
high temperature (T) state. It is then cooled down, 
equilibrated, cooled further, until collapse to a 
folded conformation occurs. For each simulation run 

30 in the transition region between unfolded and folded 

states, at least 1.25 x 10 6 Monte Carlo time steps 
are sampled. The set of elemental moves employed in 
the simulation satisfies the well known stochastic 
kinetics master equation describing the dynamics of 

35 the system. (Refer, for example, to Appendix B.) In 

the limit (after a large number of steps) , an 
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equilibrium distribution of states is generated. 

With respect to the thermodynamics of 
folding, a detailed explanation is presented below. 
By restricting the protein to the lattice, it may be 
treated as a rotational isometric state model of the 
protein. First, the transition from the denatured to 
the native state is treated in the context of a two- 
state model. The free energies of the denatured 
state A D , and the native state A N are calculated as 
follows: A D is calculated by neglecting all tertiary 
interactions in the denatured state (although 
pentane-like effects are included) . In the 
calculation of A D/ long range excluded volume effects 
are neglected. For the calculation of A N , small 
local fluctuations about the native state are 
neglected, and A H is approximated by the energy of 
the native state E N . 

In the context of a two-state model for 
folding, the fraction of molecules in the native 
state, f H , is given by 

(2) 

exp{-(E H - A D ) }/ 
f H « [l+exp{-(E K - A D ) }]. 

where A D is given as: 

A D - K B Tln(Z D ) (3) 

K-l 

(The term Z D may be expressed as Z D = Jy v d.i j / as 
defined in Appendix C.) 

In the context of the two-state model, the 
mean square radius of gyration <S 2 >, defined as 

(4) 

2 

<S 2 > = j=l 

N 
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with Irj-rgJ representing the distance of the i bead 
from the center mass r ffl/ may be expressed as 

(5) 

<S 2 > = fH<SH 2 >+(l-fH)<SD 2 > 

where <S H 2 > and <S 0 2 > are the meem square radii of 
gyration in the native and denatured state, 
respectively . 

The above explanation may be used to select 
appropriate temperature values for use in the 
simulation. Substantial computer time can be saved 
by avoiding high temperatures associated with the 
denatured state. Also, temperatures that are too low 
can drastically quench the system. 

Conformational Transitions 

As shown below, conformational transitions 
can be approximated by a two-state model, or can be 
determined directly from folding trajectories. 

In the following paragraphs, the 
requirements for folding to a unique conformation 
(e.g., a four-member ^-barrel state) are described. 
Figs. 12A-D show a segment with backbone 
a-carbons 101 and interacting sidechain sites 103. 
Also shown in the top view are hydrophobic core 105 
with the interdigitating sidechains 107, 109. Also 
shown are the corresponding conformations 111, 113 
with a-carbons alone. 

The first of the three native turns is shown 
to involve the eight through eleventh residues with 
backbone bond angle conformations 18, 8, 18, and 10, 
respectively. The central turn is shown to involve a 
crossover connection between the two anti-parallel 
/?-strands, and involves the eighteenth through 
twentieth residues with backbone bond angle 
conformations 14, 10, and 18. The remaining outer 



WO 91/16683 



PCT/US91/02786 



25 



turn is shown to involve residues the twenty-sixth 
through twenty-ninth residues in bond angle 
conformations 12, 14, 14 and 8. The remainder of the 
bond angle states are all 16-type states. Thus, a 
5 planar 0-sheet is assumed. Within an anti-parallel 

/3-hairpin, the a-carbons are shifted with respect to 
each other by one lattice unit. This allows for the 
interdigitation of the side chains mentioned above. 
In the fully native conformation, there are twenty 

10 contacts between neighboring sidechains (i.e., twenty 

pairs of sidechain interacting sites that are a 
distance of J2 from each other) . 

In the conformation considered here, the 
pattern of hydrophobic and hydrophilic residues is 

15 the same. The model chain consists of N=37 residues. 

In each of the strands, all of the even (odd) 
residues are hydrophobic (hydrophilic) . The first 
strand consists of the first through eighth residue. 
The ninth through eleventh turn residues are all 

20 hydrophilic. The second strand runs from the twelfth 

to the eighteenth residue, with all the even (odd) 
residues hydrophobic (hydrophilic) . The nineteenth 
and twentieth turn residues are, respectively, 
hydrophilic and hydrophobic. The third strand runs 

25 from the twenty-first to the twenty-sixth residue. 

The twenty-seventh through twenty-ninth are turn 
residues, all of which are hydrophilic. The fourth 
strand runs from the thirtieth to the thirty-seventh 
residue. The first and last residues (one and 

30 thirty-seven) are virtual residues (i.e., they are 

devoid of sidechains, but they do occupy excluded 
volume) . They may be regarded as capping the two 
ends, and are included so that the bond angle state 
for the real residues (the second and thirty-sixth 

35 residue) may be defined. 
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Turning now to the subject of equilibrium 
folding, the requirements for equilibrium folding of 
a region of the chain to its unique, native structure 
(e.g., the four-member ^-barrel structure) is 
described. The interplay of an intrinsic native turn 
propensity and a short- and medium-range preference 
for 0-sheet formation is described. 

In one simulation operation, for the 
sequence B 1 (7)b 1 (4)B 2 (6)b 2 (3)B 3 (5)b 3 (4)B 4 (8) the 
parameter e e (16) was found equal to zero for the B t 
state and -0.25/T for all the other states. For the 
B L state the parameter £^(16,16,37) = .6/T, and is 
zero for all other states. For the turns b t : e e =0 
for the native conformation, and £ e =.25/T for all 
other conformations. Similarly, 

= .6(1.75)/T = -1.05/T. e phU . phL1 = -25/T, 
Sphu-phob = WT, and e phob . phob = -.75/T. The 
cooperativity parameter € c = -.15/T. In the native 
conformation, the total short range free energy 
E e = 0, the total torsional energy E tor = -25.8/T, the 
total sidechain interaction free energy arising from 
hydrophobic interactions E Jlde = -14. 25/T, and the 
cooperative interaction free energy E e = -11. 25/T. 
Thus, the total energy of the native state 
E H = -51.3/T. A summary of the conformational 
properties of this sequence, as well as all the other 
types of primary sequences, is presented in Table II. 
The primary sequence is designated by a shorthand 
notation e a > c^l.,1.75. This notation indicates 
that, based on bond angle preferences, 
^-conformations are locally preferred for the B t 
portions of the primary sequence, and that the 
torsional angle preference (for native-like 
conformations in the B L region) is locally favored by 
a ratio of 1:1.75 over that in the turn region. 
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TABLE II 

Compilation of Select d Folding Results 



5 




No. of 


NO. of 


Intrinsic 






r oiaing 




1*11 T"T1 
X UJ.I1 




Sequence 


Attempts 


Folds 


Probability 


10 


e a >e 0 ; 1. 1.75 


5 


5 


0.0046 




e B >e,; l.; 1.5 


6 


4 


0.0021 




1.; 1.5 


6 


6 


0.0025 




ea=e $ i 0.5; 1.5 


7 


5 


0.0093 




e B =e,; 0.; 1.5 


11 


5 


0.063 


15 


e a =e„; 0.; 1.75 


10 


10 


0.14 




e a <£^; 1.6; 0.05. 


11 


11 


0.036 




ear=€ tf ; 1. ; 0 


14 


0 


5 X 10" 5 



In the absence of long-range interactions, 
20 there is a negligible intrinsic preference for the 

native conformation. To address this point, 
reference is made to equations 2-5. Using equations 
2-5 , the transition midpoint (including tertiary 
interactions) is predicted to be near T » 0.576. 
25 Employing equation (3), it is found that at this 

temperature A D = -88.44, and that E K (without 
tertiary interactions) equals -44.79. The fraction 
of molecules in the native conformation which would 
be present if all tertiary interactions are turned 
30 off (that is, the equilibrium population based on 

short and medium range interactions embodied in E e 
and E tor alone) is given by 

(6) 

f% = exp(-E tor ) / exp(-A D ). 
35 Using equation 6, f° N is found to have the value: 

f ° H = 1.11 x 10" 19 . 

Thus, there appears to be a negligible 
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preference for the native state in the absence of 
long range interactions, suggesting that finding of 
the native conformation is by no means guaranteed by 
the above choice of short and medium range 
5 interaction parameters. Rather, this chain will 

thrash about until it finds the native state. 

The next subject described below is the 
nature of the conformational transition itself. In 
Figs. 13A-C, the average number of native contact 

10 pairs between sidechains (N c ) versus time, is plotted 

for a chain under denaturing conditions at T = 0.6, 
in the thermal transition region at T = 0.58, and 
under strongly renaturing conditions at T = 0.545. 
The times indicated in the figure are in units of 500 

15 Monte Carlo steps, and the fully native molecule 

contains twenty contact pairs. Under denaturing 
conditions, N c fluctuates around zero, characteristic 
of a relatively short, unfolded chain. In the 
transition region, the system starts out unfolded, 

2 0 and then around t/5000 = 118, it undergoes a rapid 

transition in about 6,500 Monte Carlo time steps to 
the fully native molecule. For the remainder of the 
time, it stays in the native state. Other 
conformational properties (not shown) , such as the 
25 energy, the instantaneous value of the radius of 

gyration, the total number of contact pairs N C(tot also 
undergo sharp changes in value that is a 
characteristic of an all-or none transition (i.e., a 
transition where the intermediates between the 

3 0 denatured and fully folded states are marginally 

populated). On further cooling to T = 0.545, the 
chain becomes fully native, with minor oscillations 
in N c arising from the fluctuations of the ends 
residues of the chain. 
35 Decreasing the turn propensity for native- 



WO 91/16683 



PCT/US91/02786 



29 

like states decreases the stability of the native 
conformation and decreases the transition 
temperature. In the transition region, however, not 
only are native in-register four member ^-barrels 
5 observed, but so are out-of-register conformations in 

which one of the exterior strands is two residues 
out-of-register, shifting the native contact between 
sidechains two and thirty-six to a non-native contact 
of residues two through thirty-four in one case, and 

10 to a non-native sidechain contact of residues four 

through thirty-six in the other case. In the former 
case, the outer turn began at residue twenty-five 
instead of residue twenty-six, thus, pushing the 
outer strand beyond the end of the barrel; and in the 

15 latter case, the turn began at residue twenty-eight 

and involved five residues, producing a bulge. Out 
of a total of six conformational transitions to a 
folded state, four folded directly to the native 
conformation, and two produced the out-of-register 

20 states described above. 

The out-of-register state associated with 
residues four through thirty six occurred at 
relatively high temperature and folded in about 
65,000 Monte Carlo steps. It remained folded for 

25 315,000 time units before unfolding in about 165,000 

time units. 

Many out-of-register conformations have the 
same number of contacts between hydrophobic 
sidechains as in the native state; they differ in the 

30 cooperative free energy between the strands and in 

the local conformational preferences. Dropping the 
turn preference, increases the population of these 
out-of-register states. It is seen, therefore, that 
in the absence of some intrinsic preference for 

35 secondary structure, many in-register and out-of- 
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register conformations can b generated, and it is 
the marginal intrinsic turn propensities which act to 
select from among them one conformation as the unique 
folded form. Based on tertiary interactions between 
5 hydrophobic sidechains alone, many otherwise 

degenerate conformations can be generated. Here, a 
marginal preference for /?-strand secondary structure 
plus the presence of turn neutral regions are 
required for folding to occur to a unique native 

10 state. Here, turn propensities of 1% or lower (see 

below, and Table II) are sufficient to yield folding 
to the native barrel of Figure 12 . 

It has been found that as the local 
propensity for ^-states decreases, there is an 

15 increasing population of non-native turns and out-of- 

register states, even though the native turn 
population increases as T decreases. To fold the 
system to the global free energy minimum that 
corresponds to the native conformation, therefore, 

20 the free energy of out-of -register conformations 

should be increased relative to in-register 
conformations. As the local preference for /J-states 
decreases, it becomes easier to form non-native 
turns; this appears to be the origin of the out-of- 

25 register states . Therefore, since the number of 

contacts between sidechains is approximately the same 
for the in-register and out-of-register cases, what 
determines the native conformation is the number of 
cooperative-type interactions, e er plus the 

30 differences in local conformational preferences. 

Where the local preference difference is decreased, a 
number of out-of-register states that are in deep 
local minima is observed. 

For a primary sequence of the type e a « e fi ; 

35 0, 1.5 (which is similar to the above cases, except 



that the torsional potential in the putative 0-strand 
is disregarded) , the /?- and a-states are locally 
isoenergetic. The particular sequence of the AB t 
stretches are induced by tertiary interactions. In 
all cases, the folded conformations turn out to be 
0 -barrels. Thus, tertiary interactions taken with 
local turn propensities provide for selection of 
^-collapsed states. Where the transition temperature 
is reduced, the native turn populations become 
greater. For example, the calculated turn population 
of native turn one is about 10% at T = 0.40. Based 
on tertiary interactions alone, the unique native 
state is not achieved. This is most likely due to 
the degeneracy in sidechain contacts between the in- 
register and the two residue out-of-register 
conformers. If native turn propensity is 
sufficiently augmented, it appears that tertiary 
interactions plus intrinsic turn propensities are 
sufficient to yield the unique native state. 
Further, if the short-range interactions favoring p- 
strand formation are decreased, turn formation at a 
non-native location becomes more likely and, thus, 
the intrinsic turn propensity must be augmented (see 
Table II) to insure the recovery of a unique 
conformational state. 

Next examined were sequences of the type N 
A x (7)b 1 (4)A(6)b 2 (3)A 3 (5)b 3 (4)A 4 (8) ; that is, molecules 
having the sequence c ft < e fi ; 0, 1.6; 0.05, where the 
nature of the conformational transition for model 
proteins whose /?-strands in the denatured state 
locally favor a-helix conformation, but whose amino 
acid pattern still consists of alternating 
hydrophobic and hydrophilic residues. For A ir it has 
been found that € e (12) « 0, e e (16) « 0.05/T, and for 
all the others e e = .25/T. Furthermore, it was found 



that - 0 for all the residues in A t . These 
systems (where the local preference is for an a-helix 
conformation but the global free energy minimum 
conformation is that of a /3-strand) spend substantial 
time trapped in relatively deep local minima. As the 
local preference for helical conformations is 
increased in the putative /?-strand forming regions, 
while the unique four-member ^-barrel is sometimes 
obtained, the chain generally thrashes about for over 
many millions of time steps (e.g., over 50 million) 
without finding a unique folded form. 

An important indication from these 
simulation results is that a marginal local turn 
preference plus tertiary interactions are sufficient 
to produce unique native conformations, even in the 
extreme situation where the local conformational 
preference is for helices rather than /3-sheet. If 
the native conformation is in thermodynamic 
equilibrium, then it is deemed to be at the lowest 
free energy state (conformation) , independent of how 
the free energy is divided. That is, while it is 
conceptually convenient to divide the free energy 
into short-, medium- and long-range interaction 
contributions, it is the sum of these contributions, 
i.e., the total free energy , that determines the 
equilibrium conformation. The approach taken by the 
simulations show that the local minima problem can be 
surmounted to recover the lowest free energy 
structure, which overrides local considerations if 
there is a marginal turn propensity for native-like 
turns. Thus, turns appear to play an extremely 
important role in determining the ability to recover 
a unique native conformation. 
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Folding Pathway f Trajectory) 

Turning now to a discussion of the folding 
pathway, it is seen that the sequence defines an 
observable pathway (trajectory) as it folds and makes 
5 the transition from its denatured (unfolded) 

conformation state to its native (folded) 
conformation state. A trajectory of a sample having 
the primary sequence e a > e 0 , 1., 1.75, is shown in 
Figs. 14A-B. The conf ormations at different times, 

10 are shown at different orientations that aid in the 

visualization of the folding process. At t = 585,800 
Monte Carlo time units, folding is seen to initiate 
from the central turn 115 between the ^-hairpin 
composed of strands two 117 and three 119. (Folding 

15 is not unidirectional. 0-strands may dissolve, as 

well as form, during the course of assembly.) If the 
conformation at t = 585,900 is compared with that at 
t = 586,000, it will be seen that a slight 
dissolution of the /3-hairpin 121 has occurred. By t 

20 = 586,300, the first 0-hairpin 121 is almost fully 

assembled. However, by t = 586,550, the majority of 
one of the two strands in the £-hairpin dissolves 
and, then, reforms at t =586,600. Then, there is a 
pause as the random coil tail 123 thrashes about, 

25 until the next native-like turn 125 forms. 

By t - 587,700, three of the four 0-strands 117, 119, 
127 are essentially in place. Thus, assembly to the 
three-member /3-barrel intermediates takes 1,900 time 
steps from the beginning of folding. Throughout this 

30 process, the excluded volume of the chain hinders 

assembly. Most of the configurations of the 
denatured tail are nonproductive; the tail thrashes 
about until t = 591,800 when it works its way into a 
position (s) that permits native state assembly. 

35 After which, the assembly becomes more rapid and, by 
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t = 592 , 250 , the fully folded molecule forms. Thus, 
the three-member /?-barrel is the long-lived 
intermediate, living for 4,550 time steps or 71% of 
5 the total elapsed time from the start of folding. 

The mechanism of assembly is best described as 
punctuated, on-site construction. 

With respect to unfolding of a tertiary 
structure, in all instances unfolding is the reverse 
10 of folding. Typically, unfolding starts with either 

one of the external strands becoming denatured or an 
internal stand closest to the denatured tail becoming 
unfolded. 

15 Computer System and Method 

Referring now to Figs. 3 and 15, a system 
and method are shown and described for simulating 
protein folding and determining three-dimensional 
(tertiary) structures of proteins. 

20 The system comprises an input means 57 such 

as a keyboard for specifying (entering) selected 
■ amino acid sequences and other data such as 
temperature and fold preferences, a RAM (random 
access memory) 59 for storing such data, a ROM (read- 

25 only memory) 61 with a stored program, a CRT (cathode 

ray tube) display unit 63 and/or printer 65, an 
optional auxiliary disk storage device 67 for storage 
of relevant data bases, and a microprocessor 69 for 
performing, under control of the stored program, the 

30 steps of processing the entered data, simulating the 

folding of the protein from its unfolded state to its 
folded (tertiary) state, and displaying via the 
display unit (or printer) tertiary conformations of 
the protein in three dimensions. 

35 A user enters the amino acid sequence data 



35 



file from the auxiliary storage unit) . In response 
to entry of the sequence data, the system inputs 
(specifies) the data for processing, stores the data 
in memory then processes it as shown in Figs. 15A-F. 
Sample data of the type which may be input to the 
system is shown in Appendix E. In processing the 
data, the system generates a tertiary interaction 
matrix as shown in Appendix E and produces, in 
addition to a display of the protein's tertiary 
structure, a sample output as shown in Appendix E for 
tracking the simulation. As indicated above, the 
system operates under the control of a stored 
program. A listing of the program is shown in 
Appendix D. 

Turning now to Figs. 15A-F, in response to 
the specified data the system generates a random 
conformation of backbone and sidechain elements 
(residues) . It does this by generating a set of 
random bond angles, then generating the coordinates 
of the backbone and sidechains as a starting chain in 
a 210 lattice (Fig. 4). The system then checks to 
determine if the excluded volume criterion is met, 
after which, it constructs an interaction table, a 
sample of which is shown in Appendix E. It proceeds 
to construct the interaction table by first 
establishing respective bond angle preferences, then 
establishing dihedral (rotational) angle preferences 
followed by establishing side-chain interaction 
criteria. The system then stores the temperature, 
bond angle, lattice coordinates, preferences, and 
interaction data in a table or matrix like that shown 
in Appendix E. Thereafter, the system reads the data 
from the table and constructs, by means of Monte 
Carlo simulation, a random conformation; following 
which, the system calculates the total energy of the 
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10 



conformati n represented as (E old ) . Thereafter, the 
system selects (at random by Monte Carlo simulation) 
a pair of bond vectors for rotation. It then checks 
if the rotation would violate the excluded volume 
criterion. If it would, the rotation is not 
attempted, and the system proceeds to the next step. 
If it would not violate the excluded volume 
criterion, another check is made to determine if the 
bond angles subtended by the bond vectors are between 
14 and 18; if they are, it attempts the rotation. 
Otherwise, it does not attempt the rotation and 
proceeds to the next step. In performing rotation, 
the system modifies the conformation by interchanging 
a randomly selected pair of bond vectors. In other 
15 words, it proceeds to change the rotation angle ^. 

Thereafter, the system proceeds to determine the 
coordinates of lowest energy conformation which 
satisfy the Metropolis criterion. It does this by 
first calculating the total energy (E^J of the new 
20 modified conformation then comparing the total energy 

E nev with the total energy of the old conformation 
E oU . If E old is greater than E nev/ then the 
coordinates of the old conformation are replaced with 
the coordinates of the new conformation. The system 
25 then proceeds to the next step (step B) which is be 

described below. If E old is not greater than E MV 
then, in compliance with the Metropolis criterion, a 
random number R is generated and the probability 

30 P = e K B T 

is calculated. That probability is compared with the 

random number R. If R is less than P, the 

coordinates of the old conformation is replaced with 

the coordinates of the new conformation and the 

35 system proceeds to the next step (step B) . If, 
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however, R is not less than P, the system directly 
proceeds to the next step (Step B) . At the next 
step, the system proceeds to choose a bead at random 
to move within the lattice. Before moving the bead, 
the system tests if the move (which is a jump-type 
move) would violate the excluded volume criterion. 
If no, it proceeds with the move. If yes, it does 
not proceed with the move, and proceeds instead to 
choose the next bead until all the beads in the chain 
have been checked for modification (movement) . If 
the move would not violate the excluded volume 
criterion, the conformation is modified by moving the 
bead to a new lattice site. In other words, the bead 
would make a jump move which would change its 
coordinates and associated bond angle G. After the 
move is made and the conformation is modified 
thereby, the system calculates the total energy of 
the new conformation, that is, the total energy E nev 
in a similar manner as indicated earlier. E MV is 
then compared with E eW , the energy of the previous 
conformation before the move. If E old is greater than 
E n< „, then the coordinates of the old conformation are 
replaced with the coordinates of the new, and the 
next bead move is checked. If E old is greater than 
E new , then the Metropolis criterion is applied (and 
the random number R is generated, and the probability 
P is calculated in the same manner as indicated 
earlier, as shown in Fig. 15A-F) , and the random 
number R is compared with the probability P. If R is 
less than P, the coordinates of the old conformation 
are replaced with the coordinates of the new and the 
next bead move is checked. If R is not less than P, 
the next bead move is checked and the loop is 
repeated until all bead moves (i.e., the moves of all 
n beads) have been checked, at which time if all bead 
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moves have been checked the system proceeds to the 
next step (step C) . At this next step, the system 
proceeds to process the two end beads. It identifies 
the first end bead then checks if an end flip-type 
5 move would violate the excluded volume criterion. If 

no, it proceeds with the move. Otherwise, it aborts 
the move and proceeds to check the second end bead. 
In the event the move of the first end bead would not 
violate the excluded volume criterion, the system 

10 proceeds to modify the conformation by performing an 

end-flip move that changes the coordinates of the end 
bead. It then proceeds to determine the coordinates 
of the lowest energy conformation which satisfies the 
Metropolis criterion in the same manner as it did for 

15 the rotational and jump-type moves. After 

determining the coordinates of the lowest energy 
conformation which satisfy the Metropolis criterion, 
the system checks if both end beads are processed. 
If the second end beads remain to be processed, the 

20 system identifies the second end bead and proceeds to 

check whether an end flip move of the second end bead 
would violate the excluded volume criterion. If it 
would violate the criterion and both end beads have 
been considered, it then proceeds to the next step 

25 (step D) . If it does not violate the criterion, then 

the system proceeds to modify the conformation by 
performing an end-flip move of the second end bead 
changing the coordinates of the second end bead. It 
then proceeds to determine the coordinates of the 

30 lowest energy conformation which satisfy the 

Metropolis criterion, ' after which it proceeds to the 
next step (step D) . At this next step, the system 
selects a bond at random then searches for a U-shaped 
segment. It then checks, after finding the U-shaped 

35 segment, whether a move of a translation (wave 
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motion) type move would violate the excluded volume 
criterion. If not, it proceeds with the 
modification. If it does violate the excluded volume 
criterion, it aborts the move and proceeds to check 
if all the jump-type moves were made. If all were 
made, it proceeds to the next step (step E) . 
However, if the move would not violate the excluded 
volume criterion, the system proceeds to modify the 
conformation by performing the translation/wave- 
motion-type move changing the coordinates of the 
beads defining the U-shaped segment. The system then 
determines the coordinates of lowest energy 
conformation which satisfy the Metropolis criterion, 
after which it proceeds to check if all the jump-type 
moves were made. If all the jump-type moves are not 
made (completed), it starts the loop again. One 
complete loop is represented by one rotational move, 
n jump-type moves, two end-flip moves, and one U- 
shaped move. After the loops have been completed and 
all moves made and/or aborted, the system checks to 
determine if the chain is still positioned near the 
center of the lattice. If it isn't, it moves the 
chain to the center of the lattice and adjusts its 
coordinates accordingly. Thereafter, the system 
displays a three-dimensional representation of the 
protein structure and repeats the process 
(processing) for a predetermined number of times. 
However, if upon checking whether the chain is still 
positioned near the center of the lattice, it finds 
that it remained at the position near the center of 
the lattice, the system immediately proceeds to 
displaying the three-dimensional representation of 
the protein, then repeats the process. After the 
three-dimensional coordinates of the tertiary protein 
structure are generated for display, a graphics 
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program such as SYBYL (which is commercially 
available from Tripost Associates Corporation of St. 
Louis, Missouri) is used by the system to display the 
tertiary structure corresponding to the coordinates. 
5 Sample display output is presented in Fig. 1. Sample 

printed output is presented in Appendix E. 

An alternative embodiment of the system is 
presented in Fig. 16 comprising a keyboard 151 for 
entering data representing temperature and amino acid 

10 sequences, a RAM 153 for storing the entered data, 

and a unit 155 for generating a representation of a 
lattice, including unit 157 for positioning lattice 
sites, and unit 159 for positioning a-carbons 
relative to the lattice sites. The system includes a 

15 unit 161 for combining the generated lattice 

representation and the sequence of residues, a unit 
163 for producing representations of protein 
structures, and a unit 165 for comparing the protein 
structure representations to a predetermined 

20 criterion and for selecting one of the protein 

structure representations for display. 



25 



30 
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APPENDIX A 



The following is a description of various lattice model 
rules which must be followed for constructing conformations of 
various sidechains linked to various backbone configurations. 

As shown in Figure 8, let the i th bond vector bj^ connect 
a-carbons (i-1) and (i) . Then, for a given backbone 
conformation, r 2 B may be defined as follows: 



On the 210 lattice, the allowed values of r% are 6,8,10,12,14,16 
and 18. Any other value of r 2 e is rejected as not realistic or 
not representable on the 210 lattice. For a given backbone 
conformation, four sidechain vectors are constructed. The center 
of sidechain interaction is located at the site defined by a 
diamond lattice vector d 34, of the type (±1,±1,±1), which points 
from the center of the a-carbon to the point (±1,±1,±1). The 
other three vectors f lf f 2 and f, 3 e,38,*o are of the fee type, 
whose sum is twice that of the diamond lattice vector d 34. The 
vector d has left-handed chirality (L) . With respect to the 
backbone, vector d points toward the N-terminus of the sequence. 
The orientation angle is generally not less than 60*. 

Pseudovector p is defined as the cross-product of b 1+1 

and b l: 



r*. - 



(b t +b l+1 ) 2 



p = b 1+1 ® b t 



and w is defined as: 



w 



= b,-b 



'i+i 
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Appendix A (Cont'd) 

The general procedure for the calculation d, f lt f 2 and f 3 
given as follows: If d = (d x/ d y ,d 2 ) , then 



In the following, use is made of the function isgn(x), where: 



If r 2 e = 14, then 

d„ = isgn(p x ) 
d, = isgn(p y ) 
d x = isgn(p r ) 

If r% ■ 8,12 or 16, then 

d, = isgn(p x -2b Xil+1 ) 
dy = isgn(p y -2b yl+1 ) 
d, - isgn(p,-2b x>m ) 



If r 2 e » 6 or 10, then 

d, - isgn(p x +w x ) 
dy - isgn(p y +w y ) 
d, - isgn(p x +w t ) 




(d,/dy/0) 
(d^d,, 



isgn(x) = 1 x>0 



-1 x<0 



where 



b i*i - ( b x,i+i» b : 
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Appendix A (Cont'd) 

And, if = 18, and if p, • p y ? 0, then 

d x = isgn(p x ) 

dy = isgn(p y ) 

d r = isgn(p,) . 

Otherwise, 

d, = isgn(p x +wj 

d, = isgn(p y +w y ) 

d t = isgnCp.+wJ . 
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APPENDIX B 

A generalized master equation is shown below: 

(1) 

frpHiKt) = -2 k f p({i} | {i' })q({r r })-k b p({i'} I {i})q(r i ,}) 

where 



{i} represents a first set of vectors; 

{i'} represents a second set of vectors; 

p({i},t) represents the probability of finding a set of 
vectors {i} at a time t; 

k 5 represents rate of increase of the set {i} in size 
(membership) due to move of bead from set (i'} 
to set (i) ; 

k b represents rate of decrease of the set {i} in size 
due to move of bead to set {i'} from set {i}; 

{r L } and {r'J represent coordinates of the set of 
bond vectors {i} and {i'}? 

q((r<}) represents an excluded volume function 



= 1; if {r,} are unoccupied 



0; if {r, } are occupied 



p({i}|{i'}) represents the probability of occupying set {i} 
upon moving from set {i'}; 

p({i')|{i}) represents the probability of occupying set {i'} 
upon moving from set {i}; 

and the relationship between k f and k b may be expressed as: 

kf (U{i}-U{i'))/ 
k B = exp(- k B T 
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Appendix B (Cont'd) 

wh re U{i) represents the total energy of the protein 
in the X th conformation; 

U{i'} represents the total energy of the protein 
in the i' lh conformation; 

k B represents Boltzmann's constant; and 

T represents temperature (in degree Kelvin) of the 
protein. 

A bead represents an amino acid residue comprising a full 
sidechain (i.e., four lattice sites) and backbone segment (i.e., 
seven lattice sites). A bead is shown, for example, in Figures 5 
and 9. In terms of the above equation, the probability of 
finding a set of vectors {i,i+l} at a time t in a two-bond jump- 
type move of a bead from one coordinate position (r t ) to another 
coordinate (r t 0 may be expressed as: 

pni.i+D.tl S kf P(i;i+l|i / ;i / +l;e)q(r l )-k b P(i / ;i , +l| 
t i' ,i'+l i;i+l;e)q(r 1 0 

where, 

i and i+1 represents a first pair of vectors; 

i' and i'+l represents a second pair of vectors; and 

e represents the bond angle between vectors (bonds) 
i and i+1 and between i' and i'+l. 

In addition to the single-bead jump-type move described 
above, a conformation may be modified by rotational and/or 
translational motion of one or more beads, as shown for example 
in Figures 10 and 11. 
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. Appendix C 

Calculation of the Denatured State Free Energy 

In this appendix, an expression for the free energy of the 
unfolded state of a model protein confined to a 210 lattice is 
calculated. Two cases are examined. The first corresponds to 
the situation when the torsional potential equals zero, and 
the second corresponds to the more general case when is non- 
zero. 

With respect to the lattice, each of the twenty-four 
possible vectors connecting the lattice sites may be given a 
number one through twenty- four, as follows: 



1=(2,1,0) 


13=(0,-l,-2) 


2=(2,0,1) 


14=(0,-2,-l) 


3=(2,-l,0) 


15-(0,l,-2) 


4-(2,0,-l) 


16=(0,-2,1) 


5=(1,2,0) 


17=(-1,2,0) 


6=(1,0,2) 


18=(-1,0,2) 


7-(l,-2,0) 


19= (-1,-2,0) 


8=(l,0,-2) 


20=(-l,0,-2) 


9=(0,1,2) 


21=(-2,1,0) 


10=(0,2,1) 


22=(-2,0,l) 


11=(0,-1,2) 


23=(-2,-l,0) 


12=(0,2,-1) 


24=(-2,0,-l) . 



To specify the conformation of the chain, given the location 
of the first bead, a sequence of N-l numbers, ranging from 1 to 
24, is specified with the first bond vector (vector 1) chosen 
arbitrarily as vector (2,1,0), the second vector must satisfy the 
constraint 6 <r 2 6 £18. There are 18 such possibilities, and there 
are four states such that r 2 e =6. The second vector can be 
(0,-2,±l) and (-1,0, ±2). There are two such possibilities when 
r 2 e =8, namely (0,-1, ±2) . When r 2 e =10, there are two 
possibilities as well, (-1,2,0) and (1,-2,0). If r 2 e =12, again, 
there are two possibilities with the allowed second vectors being 
(0,1, ±2). Turning to the r 2 e =14 case, there are a total of four 
possibilities (0,2,±1) and (1,0, ±2). If r 2 e =16, there is one 



WO 91/16683 PCT/US91/02786 

47 

possibility, (2,-1,0). Finally, for r 2 e =18, there are three 
possibilities (2,0,±1) and (1,2,0). In general, for a given 
vector number i, there are eighteen allowed vectors; subsequent 
allowed vectors vary depending on the particular vector that 
precedes them. 

A pseudo inner product may be defined (by analogy to ortho- 
normal basis sets) as follows: 

O'J) 5 * 1 * (C-2) 

if the two vectors i and j are allowed, and 

(U) = °. (C-3) 



if the two vectors i and j are not allowed. 

Denatured state partition functi on 6* =0 

In the absence of a torsional potential that serves to 
couple adjacent bond angle states (and which, therefore, 
introduces cooperativity into the model) , the internal partition 
function of the denatured state, 2° 0 , may be obtained from 

1=2 * V 

where J is a row vector of dimension 24, consisting of a 1 
followed by twenty-three zeros, J is a column vector consisting 
of twenty-four ones, and U M is a 24 x 24 matrix associated with 
the ith residue, each row of which contains 18 non-zero elements 
and 6 zero elements. U 0(i may be expressed as: 

U D ,M = (fc,l)«p(-£tj(W) /*b t > ■ (C-5) 

As shown below, the conf igurational partition function can be 
written as the product of the internal bond angle partition 
functions associated with each bond angle state 2 e> \: 
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N-l 



zs=rK. (c_6) 

The matrix product in equation C-4 is of the form: 

Z 0 D = £i # "XX Ml.*)"^ (C-7) 



Given that the sum of all the elements in the columns is 
independent of the row index (i.e., each row has the same set of 
bond angle states that must be summed over) , the sum of the 
products can be expressed as the product of the sums, as follows: 



which is identical to equation C-6 because 2 B(i is the same as 

Z S ..=5Xl(U) (C-9) 



k-1 



Thus, the separability of the partition function is established. 
The free energy of the denatured state is simply 

e^o HS=-k B Tin(Z° 0 ) (C-10) 

To include the effect of non-zero into the calculation of 
the partition function, the chain is divided into statistical 
weight matrices associated with pairs of bonds. That is, the 
partition function is calculated as 

K 

Zd = J 576 n U « J » (Oil) 

where J* 576 is a row vector of dimensionality 576 whose first term 
is unity and remaining terms are zero. J 576 is a column vector of 
dimensionality 576, all of whose elements are unity. l u =N if N 
is even, and l u =N-1 if N is odd. U*,- is a 576 by 576 matrix. 
For convenience in setting up uV the torsional angles are 
labeled from 3 to N-l, rather than from 2 to N-2 as in the main 
text. For i=2, one merely has to account for the bond angle 



WO 91/16683 



49 



PCT/US91/02786 



associated with the second residue. Choosing the firs-c bond as 
vector 1/ the only non-zero elements of U* 2 are 

U 2 '(l,j) = {l.j)exp(-£„(l,j)/k,T). < c - 12 > 

We next consider the case where 2<i<l u . Let the bond 
vectors associated with residues i-3, i-2, i-1 and i be labelled 
by j,k,l,m, respectively. The jth bond vector connects residues 
i-3 to i-2. The rows of U*j (row, column) are obtained from j and 
k by 

row = (j- 1)24 -i-k (C-13) 
col = (l-i)24+m K 

In defining the statistical weight matrix u*(j,k,l,i) 
associated with the torsional potential due to the particular 
sequence of the three bonds j,k,l (where k goes from vertex i-1 
to i) , the distance r f . 2({+1 between residues i-2 to i+1 is 
considered. If the square of this distance is less than 3, then 
due to the hard core stearic repulsion, 

u\(j,k,!,i) = 0 (C-15) 

If r 2 { . 2> i*i=3, then 

U,(j,k.l,i) = (j.k)(k,l)cxp[-( £ /j.k,l) + 3£ rep )/k e T] (C-16) 

If rV 2( i + i=5, then 

Up(j.k.M) = {j,k)(k,l)cxp[-(e,(j,k,l) + e rep )/k B T] > (c-17) 

For all other r 2 i-2.J*w 

U,(j,k,M) = {j. k X k . , )":pH£,(j.k,!))/k I1 T] (C -18) 



Thus, local short range repulsions are accounted for in the 
treatment as well. 
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For 2<i<l w , if l u is even, and for 2<i<l L . is odd, then 



U, f CMd.ni) = (j,k)(k,l)(I t m)cx P 0 



.(c f>l (lc f !) + c f jttm)) 




U,(j\kJJ-l)U,(k,l,rn,i) (c-19) 



If i=l Uf and l u is even then, since vertex i is at the end of the 
chain, it is necessary to only account for the last bond angle 
and torsional angle associated with vertex N-l. To make this 
last matrix conformable with the previous matrices (e.g., vector 
type 1) , an extra bond is appended at the end of the chain, 
giving: 



From the above definitions of U f J and Z, it is seen where the 
free energy A© of the denatured state can be determined from the 
equation: 



u,>0*.I.N) = <j,k>(k,i>(I t l>cxp(- t£ - (U) f k u, (j.k,I,N-i) 



(C-20) 



R,=-k B Tln(Z D ). 
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APPEKDIX D 



c 

C 



C 

c 



only left handed diamond lattice vectors can interact 
revised to include a finer trajectory 

generalized to include other favoring of torsional potentials 
uses full hydroohobic hydrophobic interaction matrix 
c based on Miyazawa Jernigan interaction scale 

c 

C **FIXES THE PROBLEM 0? GLYCINES AT POSITIONS 3 AND LENF-2 

** PRESENT IN ALL PREVIOUS VERSIONS 

ncgly short. f is a version of n c therms hort . f but which introduces 
thermalization into the wave displacements 

generates short trajectories 
like jstherrr. but it also 
calculates the number of native contacts 



SIDECKAINS ONLY A DISTANCE 0? THE SOUARE ROOT 0? TWO CAN I 

ail other faces of the sidechain are hard core 

produces equivalent interaction for 12 ana 16 states 

should produce shifted sq(10) for beta barrel like states 

with hydrophoblic core 

program uses serine. f and ergd.f 



PROTEIN 201 
THE NEXT GENERATION 



W'TH GLYCINE (NO SIDE GROUP) CODED AS 0 (zero) HYDDROPHC-ICI.; 
NO GLYCINE ASSUMED ON THE THREE ENDS SEGMENTS . ALL0X5 FOR £ STATi. 

PROGRAM SIMULATES SIMPLIFIED MODELS OF GLOBULAR PROTEINS BASED ON 
THE "210" LATTICE ALPHA-CAR30N REPRESENTATION ^ INCLUDES S0M- 
DETAILS OF A SEQUENCE DESCRIPTION. HAS 3UILD-IN CHIRALITY OF THE 
AMINOACIDS. ASYMETRIC METROPOLIS SCHEME WITH "A~ VAR-ETY"" OF L^w^ 
REARANGEMENTS OF MAIN (AND SIDE GROUPS) CHAIN SACK30NE. EDITED B\ 
AX - FEB. 1989 ST. LOUIS. 

REPULSIVE INTERACTIONS SQRT(5) ^ lBC 
WITH 1 WAVE 1 MOTIONS, HYDROGEN BONDS, COOPERATIVTTY , SIDE GROUPS . . 

THREE (+1) SITS SIDE GROUPS ^ 

NOTE THAT THIS PROGRAM USES EREP5 , EH3 , setini , REMOYEo , ^OOivu , ER«-c 
setind.f allows for interactions berween left handed chira^^ry 
diamond lattice vector 
vaxran version 

w* *********************************** ********* wrr, '' r,f ' 

this version of program was created on 5/12/89 
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c constructs the torsional potential in the progarm 



c PHISEQ used 

c ah is the l./temp in the thermalization step of the waves and 

c rotations 

c at the head of the INPUT file 



C THE LATERAL TRANSLATION OF A STRING ADDED 

C WITH SPECIFICATION OF THE TORSIONAL POTENTIAL FOR SEQOENCE 

C THIS IS GIVEN IN APH (24/24,24 ,* LENGTH ' ) ARRAY WHICH HAS TO BE 

C PREPARED AS AN INPUT FILE FILENAME- 1 PHI PAT 1 USE AKPHIMAKE 

C LIST OF BACKBONE VECTORS - USE FOR ANALYSE OF LOCAL GEOMETRY 



C 


VECTOR NR 


1 


2 1 


0 


0-11 


C 




2 


2 0 


1 


C 1 -1 


c 




3 


2 -1 


0 


0 -1 -1 


C 


(CODES ALSO 


4 


2 0 


-1 


Oil 


c 


FOR DIAMOND 










C 


LATTICE TL) 


5 


1 2 


0 


-110 


C 




6 


1 0 


2 


1-10 


/-* 

\— 






1 -2 


0 


-10 1 


r*> 
r> 




8 


1 0 


-2 


10-1 






9 


0 1 


2 


-1 0 -1 






10 


0 2 










i ^ 


0 -1 


2 








12 


0 2 


- 1_ 


- - r 


c 
















13 


0 -1 


-2 


F. C. C 


r 




14 


0 -2 


-1 


LATTICE 


C 




15 


0 1 


-2 


VECTORS 


C 




16 


0 -2 


1 




c 












C 




17 


-1 2 


0 




c 




18 


•1 0 


2 




c 




19 


-1 -2 


0 




c 




20 


-1 0 


-2 




C 












c 


VECTOR NR 


21 


-2 1 


0 




c 


IS 


22 


-2 0 


1 




r* 


THE 


23 


-2 -1 


0 




C 


CODE 


24 


-2 0 


-1 





c 

c 

IMPLICIT INTEGER (1-2) 
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REAL vaxran ^ ^ 

double precision etot,etot2,cv,anct:,ant 
real asumr2, asums2,as2 
LOGICAL GOODCLOOK 

parajr^ter ( ndim-150 ) j. , 

nTMPWSTON ASTR ( ndim ) , IDIS { ndim ) , STATN ( ndim ) _ 

SSSSzW astri?ndii) ,RIDIS(ndim) ,RSTATN(ndim> .RIHMO(na«> 



DIMENSION XYZ (ndim, ndim, ndim) , J Z g*%\ ' ihan<5 < ndin) 

DIMENSION VECT0R(-2:2,-2:2,-2:2), VX{ 24 ) ,VY( 24 ) , . <.i ) 

DIMENSION ICONF(24,24),GOODC(24,24) 

DIMENSION VECT1(24,24,5),VECT2(24,24,5) 

DIMENSION SIDGRK 24 . 24 ) , SIDGR2 ( 24 , 24 ) ,S HXJRj ( 24 ,2, ) 

DIMENSION ICAtOtndin), STLXU3 ) ,STLY( 13 ) ,STLZ 13) 

B AC(ndim,20>,AK(ndimn^ 

DIMENSION IC8(ndim),IC10(ndim),IC12^ >- 
DIMENSION PRODV ( 24 , 24 ) , ICAO ( naim ) , APH ( 24 , 2* , 2* . n--- . 
DIMENSION XNEW( ndim) , YNEW( nam) , ZKEW( ndim) , IMW n— 1= / 
dimension if lip (20, 5) ,inc( ndim, ndim) 
dimension SIX (24, 24) 
dimension S1Y(24,24) 
dimension S1Z(24,24) 

dimension xt ( ndim) ,yt< ndim) ,zt( ndim) aVlv< ,,_^._ nd ^x 

DIMENSION AM( ndim, ndim) , IHYD(nda.m) , IC6(nd2*0 ahyc(..-- 1 .wnc r; .) 

XYZ - OCCUPANCY LIST WITH SIDE GROuPS ( 0^- =1 -S?->. . . . 

X v Z - EXPLICIT COORDINATES OF j.-j-E.»r 

ICONF - R2 (VECTOR CODE, VECTOR CODE) 

ICA - EXPLICITS VECTORS DOWN THE CnA^ = 

l?u - ENERGY 0? A GIVEN SEQUENCE 0? Tr.r.r.- . 

ON CONFORMATION AND THE NUMBER Or ----- r.is— - - 

DATA VX /4*2,4*1.8*0,4*-1.4*-2/ , _ Q/ 

DATA VY /l, 0,-1, 0,2,0,-2, 0,1,2,-1,2,-1, 2,., ' . ' <,-■_/ 

DATA VZ /0,1, 0,-1, 0,2, 0,-2, 2, 1,2, -1,-2,-1, 2,1,0,*,.., 

FCC LATTICE VECTORS (AND 000) 
DATA STLX /4*0,-l,l,-l,l,-l,l,-l<l'°/ 
DATA STLY /-l, 1,-1,1, 1,-1,4*0,-1,1,0/ 
DATA STLZ /l, -1,-1,1, 2*0,1, -x, -1,1, 3*0/ 

TETRAHEDRAL LATTICE VECTORS 

DATA TLX /l, -1,1,-1, 1,-1,1,-1/ 
DATA TLY /-l , 1 ,-1, 1, 1 , "1, 1' "f-/ 
DATA TL2 /-l, 1, 1,~1, ~1, 1, 1,-1/ 

CODING THE VECTORS TO THE ARRAY 

DO XX— 2,2 
DO YY— 2,2 
DO ZZ— 2,2 
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VECTOR (XX, VY, 22 ) -0 

ENDDO 

ENDDO 

ENDDO 

VECTOR(2,1,0)-1 
VECTOR(2,0,l)-2 
VECTOR(2,-l,0)-3 
VECTOR(2,G,-l)-4 

VECTOR(l,2,0)-5 

VECT0R(l,0,2)-6 

VECTOR(l,-2,0)-7 

VECTOR(l,0,-2)-8 

VECTOR(0,1,2)«9 



VECT0R( 0,2,1) -10 

VECTOR(0,-I,2)~11 

VECTOR(0,2,-1)-12 

VECTOR (0,-1 ,-2) -13 

VECTOR (0,-2,-1) -14 

VECTOR(0,l,-2)-15 

VECTOR( 0,-2,1) -16 

VECTOR(-1,2,0)-17 
VECTOR(-1,0,2)-18 
VECTOR (-1,-2,0) -19 
VECTOR ( -1 ,0,-2) -20 

VECTOR(-2,1,0)-21 

VECTOR(-2,0,l)-22 

VECTOR(-2,-l,0)-23 

VECTOR(-2,0,-l)-24 



LIST OF CONFORMATIONS - THE SUM OF TWO VECTORS 



DO 1-1,24 
DO J-1,24 

ICONF(I / J)-(VX(I)+VX(J))**2- l -(\'y(I)*VY(J) }*»2*(VZ(I)-VZ( J) )"2 

IDOTP-IABS(VX(I)*VX(J)^VY(I)*VY(J)-i-VZ(I)*VZ(J)) 

IF(IDOTP.EQ.5) PRODV(I,J)-l 

ENDDO 

ENDDO 

THE CODE OF A VECTOR READS AS CODE-VECTOR J X , Y , 1 ) '1 TO 24 
AND VICE VERSA COORDINATES READ A3 X-VX(CODE) ! ! 



LIST OF ACCEPTABLE CONFORMATIONS 6-18 
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DO 1-1,24 
DO J-1,24 

IF(IC0NF(I,J).LT.6.0R.IC0NF{I,J) -GT.18) THEN 
C 6,8,10,12,14,16, AND R2-1B ALLOWED 

GOODC(I,J)-.FALSE. 
ELSE 

G00DC(I,J)-.TRUE. 
END IF 

ENDDO 
ENDDO 

C 

C 

r FLI»-TWIST ARRAY GIVES A DIRECT PREDICTION OF THE NEW CONF. STATE 

r VECT1(I,J/K) GIVES A FIRST VECTOR AFTER JUMP FROM SEQUENCE OF I- J 

C TO NEW STATES (SOMETIMES DEGENERATED ) K«l, . .5 < READS AS A CODS > 

C 

n 

DO 1-1,24 



DO J-1,24 

IF(GOODC{I,J)) THEN 
WX-VX(I) 
WY-VY ( I ) 
WZ-VZ(I) 
NX-VX(J) 
NY-VY(J) 
NZ-VZ(J) 
VECT1(I,J,1)-J 
VSCT1(I,J,4)-J 
VECT1 X 'I,J,5)-J 
VECT2(I,J,1)«I 
VECT2(I,J,4)«I 
VECT2(I,J,5)-I 
ICONA- ( ICONF { I , J ) -4 )/2 

GO TO (6,1,2,3,2,5,2) ICONA , 

CONFORMATION R2-o 
FOUR POSSIBILS ARRAN 3 EMENTS 

SX-WX+NX 
SY-WY+NY 
SZ-WZ+NZ 

IF(IABS(SX).EQ.2) THEN 

IF(SY.NE.SZ) THEN 

WY— WY 
WZ— WZ 
NZ— NZ 
NY— NY 
ENDIF 

wxi-wx 
;rx2-NX 

WYl-KZ 
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WZI-WY 

WY2-NZ 
WZ2-NY 

GO TO 15 
ENDIF 

IF(IAES(SY).EQ.2) THEN 

IF(SX.NE.SZ) THEN 

wx— wx 
wz— wz 

NX— NX 
NZ— N2 
ENDIF 

WY1-WY 
WY2-NY 
WX1-WZ 
WZ1-WX 

WX2-NZ 
KZ2-NX 

GO TO 15 
ENDIF 

IF(IA3S(SZ).SQ.2) THEN 

IF(SX.NE.SY) THEN 

WX— WX 
WY — WY 
NX— NX 



NY— NY 
ENDIF 

WZ1-KZ 
WZ2-NZ 
WXI-WY 
WY1-WX 

WX2-NY 
WY2-NX 

ENDIF 

Nl-VECTOR { WX1 , WY1 , WZ1 ) 

N2-VECTOR ( WX2 , WY2 , WZ2 ) 

VECT1(I,J,2)-N1 

VECT2(I,J,2)-N2 

VECT1(I,J,3)«N2 

VECT2(I,J,3)=N1 

GO TO 7 

CONFORMATION R2- 

KX-1 
KY-I 
MZ-I 

IF(IABS(WX).EQ.l) KX— I 
IF(IABS(WY) .EQ.l) KY— 1 
IF(IA3S(WZ) .EQ.l) KZ-- 1 
PX-WX*MX 
PY-WY*MY 
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PZ-WZ*KZ 

I 2 "VECTOR (PX, PY,P2) 
LX-NX*MX 
LY-NY*MY 
LZ-NZ*MZ 

J2-VECTOR ( LX , LY , LZ ) 

VECT1(I,J,2)-I2 

VECT2(I,J,2)«J2 

VECT1(I,J,4)-I2 

VECT2(I , J, 4 )«J2 

VECT1{I,J,5)-J2 

VECT2(I,J,5)-I2 
VECT1(I,J,3)-J2 
VECT2<I,J,3>«I2 
GO TO 7 

CONFORMATION R2-10 
CONFORMATION R2-14 
CONFORMATION R2-1B 

VECT1(I, J,2)«J 
V2CT2(I,J,2)-I 
VZCT1(I, J,3)-J 
VECT2{I,J,3)-I 
GO TO 7 

CONFORMATION R2-I2 

TEMPCO- 3 * WX * NX+ 2 * WY * K Y+ WZ * N Z 

SX-WX+NX 

SY-WY+NY 

SZ— WZ+NZ 

TEMPCO- 3 X AXIS DIRECTION IN THE ORIGINAL STATE 

-2 Y 



-1 Z DIRECTION 

GO TO (13,12,11) TEMPCO 
WX1-SX 
WX2-0 
WZ1-0 
WZ2-SZ 
WY1-SY/2 
WY2-SY/2 
KX1-SX 
KX2-0 
KY1-0 
KY2-SY 
KZ1-SZ/2 
KZ2-SZ/2 
GO TO 14 
WY1-SY 
WY2-0 
WZ1-0 
WZ2-SZ 
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KX1-SX/2 
WX2-SX/2 
KY1-SY 
KY2-0 
KX1-0 
KX2-SX 
KZ1-SZ/2 
KZ2-SZ/2 
GO TO 14 

13 WZ1-SZ 
WZ2-0 

WX1-0 
WX2-SX 
WY1-SY/2 
WY2-SY/2 
KZ1-SZ 
KZ2-0 
KY1-0 
KY2-SY 
KX1-SX/2 
KX2-SX/2 

14 Nl-VECTOR ( WX1 , WY1 , WZ1 ) 
N2 "VECTOR (WX2 , TOT 2 , WZ2 } 
VECT1(I,J,2)-N1 
VECT2(I,J,2)«N2 
MI-VECTOR ( KX1 , KYI , KZ1 ) 
M2-VECTOR ( KX2 , KY2 , KZ2 ) 
VECTl(I,J,3)-ttl 
VECT2(I,J,3)-M2 
VECT1(I,J,4)-K2 
VECT2(I,J,4)-N T 1 
VECT1(I,J,5)-K2 
VECT2(I,J,5)«K1 

GO TO 7 

- C3NTCR>L-"CS R2 

~3 SX-WX+NX 
SY-WY+NY 



SZ-WZ+NZ 

TEMPC0^3*IABS(SX)*2*IABS(SY)-IMS(SZ))/4 
GO TO (23,22,21) TEMPOO 
21 WX1-WX 
WX2-NX 
VY1-WZ 
WY2-NZ 
WZ1-WY 
WZ2-NY 
KX1-WX 
KX2-NX 
KY1-NZ 
KY2-WZ 
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KZ1-NY 
KZ2-WY 
GO TO 24 
22 WiTl-WY 



23 



WY2-NY 
WX1-WZ 
WX2-NZ 
WZ1-WX 
WZ2-NX 
KY1-WY 
KY2-NY 
KX1-NZ 
KX2-WZ 
KZ1-NX 
KZ2-WX 
GO TO 24 
WZ1-WZ 
WZ2-NZ 
WX1-WY 
WX2-NY 
WY1-WX 
WY2-NX 
KZ1-WZ 
KZ2-NZ 
KX1-NY 
KX2-WY 
KYI-NX 
KY2™WX 
Nl-VECTOR ( WX1 , V7Y1 , WZ 1 ) 
N 2- VECTOR ( WX2 , WY2 , WZ2 ) 
VECT1(I,J,2)-N1 
VECT2(I,J,2)-N2 
VECT1(I,J,4)-N2 
VECT2(I,J,4)-N1 
Ml -VECTOR ( KX1 , KYI ,KZI) 
M2-VECTOR ( KX2 , KY2 , KZ2 } 
VECT1(I,J,3)-MI 
VECT2<I,J,3)-K2 
VZCT1(I,J,5)«K2 
VECT2(I,J,5)-K1 
CONTINUE 

CONFORMATION IS NOT ACCETA2I 

ELSE 



DO K-1,5 
VECT1(:,J,K)«0 
VECT2(I, J,K)-0 
ENDDO 
END IF 

ENDDO 
ENDDO 
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C 

c 
c 
c 
c 
c 
c 
c 
c 
c 



SIDE GROUPS - EXPLICITE DEFINITION BASED ON CONFORMATION R2 



SIDGR1(24,24) - 
SIDGR2(24,24) - 
SIDGR3 ( 24 , 24 ) - 



CONTAINS CODES 0? 110 VECTORS 
CONTAINS CODES OF 110 VECTORS 
CONTAINS CODES OF 110 VECTORS 



DO 1-1,24 
DO J-1,24 

IF(.NOT.GOODC(I,J)) GO TO 40 

Xl-VX(I) 

Yl-VY(I) 

Zl-VZ(I) 

X2«VX(J) 

Y2-VY(J) 

Z2-VZ(J) 

ICONA- ( ICONF ( I , J ) -4 ) /2 
PX-Y1*Z2-Y2*Z1 
PY-X2*Z1~Z2*X1 
PZ«X1*Y2~Y1*X2 



PX— PX 
PY— PY 
PZ-- PZ 

WX-X1-X2 
WY-Y1-Y2 
WZ— Z1—Z2 

GO TO (33,32733,32,31,32,36) ICONA 



SUMAX-PX 
SUMAY-PY 
SUMAZ-PZ 



32 



SUMAX-PX-2*X2 
SUMAY-PY- 2 *Y2 
SUMAZ-PZ-2*Z2 
GO TO 39 



C 
C 



CONFORMATION R2-14 



CONFORMATION R2-e 
CONFORMAT I ON R2-12 
CONFORMATION R2-15 



CONFORMATION R2-6 
CONFORMATION R2-1C 



SUMAX-PX+WX 
CUMAY-PY+WY 



SUKAZ-PZ+WZ 
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GO TO 39 

c CONFORMATION R2-18 

C 

36 IF(PX*PY.NE.O) THEN 

C THE CASE OF DOWN THE AXIS CONFORMATION 

SUMAX-PX 
SUMAY-PY 
SUMA2-PZ 

GO TO 39 
ENDIF 

c THE CASE OF 330 CONFORMATION 

SUMAX-PX+WX 
SUMAY-PY+WY 
SUMAZ-PZ+WZ 

C 

"39 SUX-ISIGN(1,SUMAX) 
SUY«ISIGN(1,SUMAY) 
SUZ-ISIGN<1,SUMAZ) 

Xl-SUX 

X2-SUX 

X3-0 

Yl-SUY 

Y2-0 

Y3-SUY 

Zl-0 

Z2-SUZ 

Z3-SUZ 

GIVES THE CODE OF ( STLX, STLY, STLZ )V, VALUE 1,2 ,..12 
IC0DT-9*X1+3*Y1+Z1 
IF ( ICODT. LT.O) IC0DT--1-IC0DT 
SIDGR1(I,J)-IC0DT 
ICODT-9*X2+3*Y2+Z2 
IF(ICODT.LT. 0) ICODT— 1-ICODT 
SIDGR2(I,J )-ICODT 
ICODT-9*X3+3*Y3+Z3 
IF ( ICODT. IT. 0) ICODT— I- ICODT 
SIDGR3(I,J)-ICODT 
insert of check fcr handedness 
x4-(xl+x2+x3)/2 
y4-(yl+y2+y3)/2 
z4«(zl+z2+z3)/2 
SlX(i, j)-x4 



C 
r 



SlY(i,j)-y4 
SlZ(i,j)-z4 

40 CONTINUE 
ENDDO 
ENDDO 



INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT 
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SET UP OF THE VECTOR REPRESENTATION OF THE CHAIN 

OPEN ( UNIT-5 , FILE- ' INPUT 1 , STATUS- ' OLD* ) 
OPEN ( UNIT- 10 , FILE- ' FILEDAT ' , STATUS- ' OLD ' ) 
OPEN ( UNIT-6 , FILE- ' OUTPUT * , STATUS- 1 OLD' ) 
OPEN ( UNIT-1 , FILE- ' SEQUENCE ' , STATUS- 'OLD' ) 
OPEN ( UNIT- 11 / FILE- ' contact • , STATUS- ' OLD ' ) 
OPEN ( UNIT-12 , FILE- 1 PHISEOH ' , STATUS- 'OLD* ) 
OPEN ( UNIT- 14 , FILE- ' PHISEOHR ' , STATUS- ' OLD ' ) 
OPEN(UNIT-13,FILE-*TRACE' , STATUS- ' OLD ' ) 
open ( uni t- 15 , file- 1 hydmap' , status-' old' ; 
open (unit- 16 , file- 1 output ' , status- * old ' } 

READ(10,*) LENF 

LENF1-LENF-1 

LENF2-LENF-2 

AL2-LEhr 2 

LENF3-LENF-3 

AL3-LENF3 

LENF4-LENF-4 

AL4-LENF4 

AL6-LENF-6 

AL9-LENF-9 

KIX-1 

LENHA-LENF/2 



do i-1,100 

STATN(i)-0.dO 

IDIS(i)-0.dO 

ASTR(i)-0.dO 

IKAND(i)-0.dO 

RSTATN(i)-0.dO 

RIDIS(i)-0.dO 

astrr(i)-0.d0 

RIHAND(i)-0.dO 

do j-1,100 

do k-1,100 

xyz(k,j,i)-0.dC 
end do 
end do 
end do 



******ir****w**wTt*ir*TitC£QUENCE readxKG****"***** ****** ********* 



******* **r*****R£AT)iKG OF TORSIONAL POTENTIALS*" * ^ 



EXPLICIT CONSTRUCTION OF APH 
DO LJ-2,LENF1 
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READ ( 1 2 , » ) K , S TAT K ( U ) , I D I S ( LJ ) , AS TR ( U ) , I HAN D ( LJ ) 



END DO 

c generalized to include other conformational prefrences 

read (14,*) other 
if (other .eq. 0) go to 542 
do li-1, other 

READ ( 14 , * ) K , RSTATN ( k) , RIDIS (k ) , ASTRR ( k ) , RIHAND ( k ) 
END DO 
54 2 continue 

w*«w«iririr»»»sEOUENCE READING w ~ " T * 



CAUTION::: THE REVERSE PATTERN IS NOT ALLOWED HAS TO BE 
ASSUMED AS A PREFERENCE FOR A GIVEN STATE 



C 

C STATN - R2( 1-1,1+1) 

C IDIS - R2(I-1,I X 2) 

C ASTR- STRENGTH OF PREFERENCE FOR THE DIHEDRAL ANo-~ 

c ah plays the role of the thermalization factor 

read(5,*)ah 
WRITE(6,8120) 

£-20 FORMAT (IX, ' ** THE THREE SIDE GROlr rR0oRA>. § 

- AND GLI clypr edict. f full hydrophobic interaction matrix 

'uses not necessarily native conf ir. torsions',/; 

SBGL-0 

INDGL(l)-0 

INDGL(LENF)-0 

DO I-2,LENF1 

INDSL(I)-1 

READ( 1, * ) K,IC6(I), 

- IC8(I),IC10{I),IC12(I),IC14(I) / IC16(I) / IC18(I),IHyD;i) 

IF(IHYD(I).EO-0) THEN 
INDGL(I)-0 
NBGL-NBGL+1 
ENDIF 

ENDDO 

r 

c »«****•»•*•»«**»»*. »»*input f: 



READ ( 5 , " ) RANDOM , NC YCLE , PHOT 
READ ( 5 , * ) AC6 , ACS , AC10 , AC12 , AC14 , AC16 , AC1 e 
READ ( 5 , * ) APLPB , BPLPL , CP3PB , AREP , AK3 , APHI 
READ(5,*) ATEMP , WAVEL 

WRITE (6,8020) RANDOM , N CYCLE , PHOT , WAVEL 
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8020 FORMAT ( IX , 1 THE THREE SIDE GROUP PROGRAM AND GUI .**',/, 

'DIAMOND LATTICE SITES INTERACT ' ,/, 

* ■** glypredict with .5 *jernigan 

* IX, 1 RANDOM SEED -',16, ' NUMBER OF CYCLES' ,215,/, 

* IX, r MAXIMUM WAVE LENGTH -',14,/ ) 
write (6, 8999) ah 

8999 format<lx, f temp/terapthermal «',lf8.4) 



WRITE (6,8021) AC6 , ACS , AC10 , AC12 , AC14 , AC16 , AC1 S 

8021 FORMAT(5X,/,3X, ' ENERGY OF STATE 6 -\F6.2,/, 

- 3X,' ENERGY OF STATE B ,F6.I,A 

- 3X,' ENERGY OF STATE 10° x ,F6.2,/, 

- 3X,' ENERGY OF STATE 12-' ,F6.2, A 

- 3X,' ENERGY OF STATE 14-',F6.2,A 

- 3X, 1 ENERGY OF STATE 16-\F6.2,/, 
* 3X, 1 ENERGY OF STATE 18- * ,F6.2,/) 

WRITE (6, 8022) APLFB,BPLPL,CPBP3 

8022 FORMAT ( 3X, * PHIL-PHOB, PHIL-PHIL, PH0B-PH03 ' ,4F8.3,A 
WRTTE(6,8024) ARE?, AK3, APHI 

8024 FORMAT ( 3X, 1 REPULSIVE INT. AND COOPER. *K-BOND 2FS . 3 ,/ 

- ,3X, 1 SCALING FACTOR FOR DIHEDRAL ANGLE POTENTIAL ' ,F3.3,/; 
WRITE (6, 8023) ATEMP 

8023 FORMAT ( IX, A 3X, 1 TEMPERATURE OF THE SYSTEM ,F8.3,/; 

c construction of native contact na? 

do i«l,ler.£2 
do j«i,lenfl 
inc(i, j)-0.d0 
inc( j ,i)»0.d0 
end do 
end do 



read(ll,*)ntot 
write (6,2039) ntot 
2039 format(lx, 'not*' ,i3, A *the native contact pairs are' ; 
do i-l,ntot 
read(ll,*)j,k 
inc(j,k)-I 
vrite(6,*)j,k 
end do 

r *********** s£x THE CURRENT FORCE OF INTERACTIONS 

APHI-APHI/ATEMP 

AHB-AHB/ATEMP 

AC6-AC6/ATEM? 

AC 8 -AC 8/ATEMP 

AC10-AC10/ATEMP 

AC12-AC12/ATEMP 

AC14-AC14/ATEMP 

AC16-AC16/ATEM? 
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AC18-AC18/AT™? 

APLPB-APLPB/ATEMP 

BPLPL-BPLPL/ATEMP 

CPBPB-CPBPB/ATEMP 

AREP-AREP/ATEMP 

do i-l,lenf2 .... , ... 
read ( 15 , * ) ( ahyd ( i , 3 ) , D - 1 , len f 2 
vrite(6,*)<ahyd(i,j>,:)-:L,lenf2) 

do j-i,lenf2 
ahyd(j,i)-ahyd(i,j> 
end do 
end do 



DO 1-2 , LENF1 

iml-i-1 

DO J-2,lenfi 

jml-j-1 

IF(IABS(I-J)-GT.2) THEN 
am( i / j ) -ahyd ( iml , jntl ) /atemp 
ELSE 

rPRIORi°NO INTERACTIONS OF SIDE GROUPS WHES/I-J/O 

ENDIF 

ENDDO 

ENDDO 

DO I-2,LENFi 

AC(I.6)-IC6(J)*AC6 

ACC,8)-ICeC)*AC8 

ACC,10)-IC10{I.)*AC10 

AC(I,12)-IC12(I)*AC12 

AC(I,14)-IC14(I)*AC14 

AC(I,16)-IC16(I)*AC16 

AC(I,18)«IC18(I)*ACI8 

ENDDO 



DO 100 1-1,24 

Xl-VX(I) 
Yl-VY(I) 
Zl-VZ(I) 

DO 1200 J-1,24 

IF<GOODC(I,J)) THEN 
X2-VX(J> 
Y2 CS VY ( J) 



►READING OF TORSIONAL POTENTIALS* * * 



Z2«VZ(J) cRQss pR03 , JCT Q? T ._. £ T ^ 0 r -= £T VECTOR 
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PX-Y1*Z2~Y2*Z1 

PY-Z1*X2-Z2*X1 

PZ-X1*Y2-Y1*X2 
stl-iconf (i, j ) 
DO 300 K-1,24 

IF(.NOT.GOODC(J,K)) GO TO 300 
X3«VX(K) 
Y3-VY(K) 
Z3-VZ(K) 

IHAN-PX*X3+PY*Y3+PZ*Z3 
IHAN«SIGN(1,IHAN) 
st2-iconf(j,)c) 



DO 401 INDEX«2,LENF2 
B-0. 

aph(i, j,k,index)-0. 



IF ( STAIN ( INDEX ) .EQ.ST1. AND . STATN ( INDEX+1 ) . EQ. ST2 ) THEN 
KX-X1+X2+X3 
KY-Y1+Y2+Y3 
KZ-Z1+Z2+Z3 

R2-KX*KX- t KY*KY^KZ*KZ _ 
IF ( R2. EQ. IDIS (INDEX) . and. ihan . eq. ihar.d : index ) ) 5-ASIR ; IIwEX ; 
ENDIF 

APH(I,J,K,INDEX)-(B)*APHI 

IF ( RSTATN ( INDEX ) . EQ . ST1 . AND . RSTATN ( INDEX- 1 ) . EO . SI2 ; I HEN 

KX-X1+X2+X3 

KY-Y1+Y2^Y3 

KZ-Z1+Z2+Z3 

R2-KX*KX+KY*KY-^KZ*KZ 
IF ( R2 . EQ . RIDIS ( INDEX ) . and . ihan . ec . Rihanc ; index } ) B-astrR ( INDEX ) 

ENDIF 

APH(I,J,K,INDEX)-(B)*APHI 
401 CONTINUE 



300 CONTINUE 

END IF 
1200 CONTINUE 
100 CONTINUE 

ICA(0)-1 

; this is because the simplicity of A?H reading, value irrelevant 

ICA(LENF)-! 
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C caution (zero initialization assumed) 

MAX-100 
MID-KAX/2 
SX-0 
SY-0 
SZ-0 

DO I-1,LENF 

READ(10, *) X{I),Y(I),Z(I) 

SX-SX+X(I) 

SY-SY+Y(I) 

SZ-SZ+Z(I) 

ENDDO 

SX-SX/LENF 

SY-SY/LENF 

SZ-SZ/LENF 

XSHIFT-MID-SX 

YSHIFT-MID-SY 

ZSHIFT-MID-SZ 
DO I- 1, LENF 
X(I)-X(I)+XSHIFT 
Y(I)-Y(I)+YSHIFT 
Z(I)-Z(I)+ZSHIFT 
ENDDO 

DO I-1,LENF1 
J-I+l 



WX-X(J)-X(I) 

WY«Y(J)-Y(I) 

WZ-Z(J)-Z(I) 

ICA(I) -VECTOR ( WX , WY , WZ ) 

ENDDO 

CALL setin(XYZ,INDGL(l) ,X(1) ,Y(1) ,Z{1) ,13,13, 13,1) 

CALL setin ( XYZ , INDGL ( LENF ) ,X( LENF ) , Y( LENT ) , Z ( LENT ) , 13 , 1 3 , 1 3 , 1 ) 

DO J-2,LENF1 

I- J-l 

II- ICA(I) 
JJ-ICA(J) 

IF(GOODC(II,JJ)) THEN 

51- SIDGR1(II,JJ) 

52- SIDGR2(II, JJ) 

53- SIDGR3(II, J J) 

CALL setin(XYZ , INDGL( J) , X( J) , Y( J) , Z( J) , S1,S2, S3 , J) 
ELSE 

WRITE(6,8001) I, J 
8001 FORMAT ( 5X , * WRONG INPUT CHAIN - VECTORS ',214) 

GO TO 9000 
END IF 

ENDDO 

C 



C 



CALCULATION OF THE ENERGY OF INITIAL STATE 
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E-0. 

ENERG-O. 

DO J-2,LENF1 

I- J-l 

II- ICA(I) 
JJ-ICA(J) 

C ROTATIONAL CONTRIBUTION* 

JCONF-ICONF (II, JJ) 

ENERG-ENERG+ AC ( J , JCONF ) 
C INTERACTIONS OF SIDE GROUPS 



IX-X(J)+SlX(ii, jj) 
IY-Y(J)+SlY(ii, jj) 
IZ-Z(J)+SlZ(ii, jj) 

E-E+ERG (XYZ, INDGL( J) , AM, IX, IY, IZ, J) 

continue 
ENDDO 

ENERG-ENERG+E/2 . 

COOPERATIVE AND HYDROGEN BOND 

E-0. 

DO I- 2, LENT 1 

E-E+EKB(XYZ,ICA,PRODV,X(I) ,Y(I) ,Z(I) ,I,AHB) 
ENDDO 

ENERG-ENERG+E/2 

REPULSIVE INTERACTIONS 

E-0. 

DO 1-2 /LENT 1 

E-E+EREPUL(XYZ,X(I) , Y(I) ,Z(I) ,1. ) 



4501 



ENDDO 

2 ' this is because the implicite symmetry cf repulsive interactions . 

Z which is taken into account in the remainder" cf the program. 

E-(E-AL3*2.)/2. 
ENERG«ENERG+E*AREP 

C DIHEDRAL POTENTIAL 

DO J-2,LENF2 
II-ICA(J-l) 
JJ-ICA(J) 
KK-ICA(J^l) 

ENERG-ENERG+ APH ( 1 1 , J J , KK , J ) 
ENDDO 

c/ 7//////////////////////////////// /////////////// 
RN1-RAND0K* 2-5- 7 5 3 1 
RN 2 -RANDOM* 2+8883 
RN3-RANDOyi*6+7907 
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C ir DYNAMICS OF THE CHAIN 

C * 

C ***************************************************** 

c 
c 

C MAIN CLOCK OF THE ALGORITHM 

C 

ICLOCK-1 

C 

QROT-0 
QWAVE-0 
QKINK-0 
QEND-0 

C 

asumr2-0. 

asums2-0. 

etot-O.dO 

etot2-0.d0 

sxd-O.dO 

syd-O.dO 

szd-O.dO 

anet-O.dO 
ant-O.dO 

vrite(6,931) 

931 format (lx, 'iterm- R2- AS 2- ENERGY- native any contacts ) 



DO 7777 I TERM* 1 , NCYCLE 

iclock-iclocX^l 

DO 7700 IDUMI-1,100 

iclock-iclock-rl 



DO 7770 IPHCO-l,PHOT 

C 

IF( ICLOCK.GT. 2000 ) ICLOCK-ICLOCK-vaxran ( rn2 ) *1000 

C 

C LATERAL WAVE DISPLACEMENT 

C 

c set ud of the thermalization move 

if (vaxran(rn2) .gt. .01) then 
af-l.dO 

else 

af-ah 

end if 

IVA-MOD ( I CLOCK , WAVEL ) * 3 



WO 91/16683 



PCT/US91/02786 



70 



I-INT(vaxran ( rnl ) *AL6 ) -r-3 
iF(I.GT.LENHA) THEN 

IFIRST-I-IVA 

ILAST-I 

ELSE 

IFIRST-I 

ILAST-I+IVA 

ENDIF 
WI-ICA(IFIRST) 
JL-ILAST-1 
WJ-ICA(JL) 
JCONF-ICONF(WI ,WJ) 

IF(JCONF.LT.14.0R.JCONF.GT.18) GO TO 7001 
IF(.NOT.GOODC(ICA(IFIRST-l),WJ)) GO TO 7001 
IF(.NOT.GOODC(WJ,ICA(IFIRST+l))) GO TO 7001 
IF( .NOT.GOODC(ICA(ILAST-2),WI)) GO TO 7001 
IF(.NOT.GOODC(KI,ICA(ILAST))) GO TO 7001 



C REMOVE THE STRING 

DO K«IFIRST,ILAST 
II-ICA(K-l) 
KK-ICA(K) 

IKS1-SIDGR1(II,XK) 

IKS2-SIDGR2(II,KK) 

IKS3-SIDGR3(II,KK) 

XJ-X(K) 

YJ-Y(K) 

ZJ-Z(K) 

CALL REMOVE ( XYZ, INDGL(K) ,XJ, YJ, ZJ,IKS1/ IKS2, IKS3 ) 
ENDDO 

C SETIN AND EXCLUDED 

C VOLUME TEST 

C THE NEW VECTORS 

ICA(IFIRST)-WJ 

ICA(JL)-W: 

IFA-IFIRST-1 
XJ-X(IFA) 
YJ-Y(IFA) 
ZJ-Z(IFA) 
DO J-IFIRST,ILAST 



II-ICA(J-l) 

JJJ-ICA(J) 

XJ«XJ+VX(II) 
YJ-YJ+VY(II) 
ZJ«ZJ+VZ(II) 
XNEW(J)-XJ 
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YKEW(J)-YJ 
ZNEV7{J)-ZJ 

51- riDGRl(II, JJJ) 

52- SIDGR2(II,JJJ) 

53- SXDGR3(II,JJJ) 

IF(LOOK(XY2 / IKDGL(J),XJ,YJ / ZJ,Sl,S2,S3)) THEN 

V THEN REMOVE AND TERMINATE 

IF(J.EQ.IFIRST) GO TO 2004 
DO K-IFIRST,J-1 
KK-ICA(K-l) 
KKK-ICA(K) 

51- SIDGR1(KK,KKK) 

52- SIDGR2<KK,KKK) 

53- SIDGR3(KK,KKK) _ ^ 
CALL REM0VE(XY2,INDGL(K),XNEWCK),YNEW(K),ZNEW(K) ,S1,S2,S2) 

ENDDO 

,004 ICA(IFIRST)-WI 
^ UU * ICA(JL)«WJ 

DO I«IFIRST,ILAST 

II-ICA(I-l) 

JJJ-ICA(I) 

S1~SIDGR1( II , JJJ) 
S2«SIDGR2(II,JJJ) 
S3-SIDGR3(II, JJJ) 

CALL setin(XYZ, INDGL( I ) ,X( I ),Y( I), Z(I),S1,S2 ,S3,I) 
ENDDO 
GO TO 7001 
ELSE 

r SET NEW 3EAD 

CALL setin(XYZ,INDGL(J),XJ,YJ,ZJ, 51,52,53, J) 
ENDIF 

ENDDO 

c THE NEW STRING KEEPS EXCL'JTED 

r 

C COMPUTATION Or ENERGY OF 



ENEW-0. 

ER-0. 

E-0. 

DO J-IFIRST,ILAST 

I- J-l 

II- ICA(I) 
JJ-ICA(J) 

XJ-XNEW(J) 



YJ-YNEW(J) 
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ZJ-ZNEW( J) 

51- SIDGRldl, JJ) 

52- SIDGR2(II,JJ) 

53- SIDGR3(II,JJ) 
JCONF- ICONF (II, JJ) 

ENEW- ENEW+AC ( J , JCONF ) +APH ( II , JJ , ICA { J+l ) , J ) 

INTERACTIONS OF SIDE GROUPS 

IX-XJ+SlX(ii, jj) 
IY«YJ+Sly(ii, jj) 
IZ»ZJ+Slz(ii, jj) 

E-E+ERG ( XY Z , INDGL ( J ) , AM , IX , IY , I Z , J ; 

COOPERATIVE AND HYDROGEN BOND 
E-E+EHB ( XYZ , I CA , PRODV , X J , YJ , Z J , J , AHB ) 

REPULSIVE INTERACTIONS 

ER-ER+EREPUL ( XYZ ,XJ,YJ,ZJ, AREP ) 

CALL REM0VE(XYZ,INT)GL(J),XJ,YJ,ZJ,S1,S2,S3) 

ENDDO 

ENEW-ENEW+APH ( ICA ( IFIRST-2 ) , ICA(IFA) , ICA< IFIRST} , ZTk) 
ENEW-ENEW+E+ER 



C COMPUTATION OF THE OLD ENERGY AND SETIN OF THE CHAIN PIECE 

C 

C THE OLD VECTORS 

ICA(IFIRST)-v*I 
ICA(JL)-WJ 

EOLD-0. 

ER-O. 

E-0. 

DO J~IFIRST,ILAST 

XJ-X(J) 

YJ-Y(J) 

ZJ-Z(J) 

II-ICA(J-l) 

JJJ"*ICA( J) 

51— sIDGRl ( II , JJJ) 

52— SXDGR2 { II , JJJ ) 

53— SIDGR3 { II , JJJ) 
c s4-sidgr4(ii, jjj) 
c tx«tlx(s4) 

c ty-tly(s4) 
c tz-tlz ( s4 ) 

CALL se^(XYZ,IND3L(J) ,XJ,YJ, ZJ, SL, 52, S3, J; 

JCONF- ICONF (II, JJJ ) 

EOLD-EOLD+ AC ( J , JCONF ) + APK ( 1 1 , JJJ , ICA ( J- 1 } , J J 
C INTERACTIONS OF SIDE GROUPS 

IX-XJ+Slx(ii, jjj) 
IY«YJ-Sly(ii,jjj) 
lZ-ZJ+siz(ii, jj j> 
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E-E+ERG(XYZ,INDGL(J),AM,IX,IY,IZ,J) 

c COOPERATIVE AND HYDROGEN BOND 

E-E+EHB (XYZ, ICA/ PRODV , XJ , YJ, Z J , J / AHB ) 

REPULSIVE INTERACTIONS 

ER-ER+EREPUL< XYZ , XJ, YJ > ZJ , AREP ) 
ENDDO 

EOLD-EOLD+APH ( ICA( IFIRST-2 ) , ICA( IFA) , ICA(IFIRST) , IFA) 
EOLD- EOID+E+ER 

C 

C METROPOLIS CRITERION 

DE-ENEW-EOLD 

IF(EXP(-DE*af ) .GT.vaxran(rn3> ) THEN 
* ACCEPTED 
OROT-QROT-?*1 
ENERG-ENERG+DE 

DO J-IFIRST, I LAST 

II-ICA(J-l) 

JJ-ICA(J) 

51- SIDGR1(II,JJ) 

52- SIDGR2(II,JJ) 

53- SIDGR3{II,JJ) 

CALL REMOVE(XYZ,INDGL<J),X(J) , Y( J) ,Z{Z) , SI , S2 , S3 ) 
ENDDO 

C THE NEW VECTORS 

ICA(IFIRST)«WJ 

ICA(JL)«WI 
DO J-IFIRST, ILAST 
XJ-XNEW(J) 
YJ-YNEW(J) 
ZJ-ZNEW(J) 
X(J)-XJ 
Y(J)-YJ 
Z{J)«ZJ 
II-ICA(J-l) 
JJ-ICA(J) 

51- SIDGR1(II,JJ) 

52- SIDGR2(II,JJ) 

53- SIDGR3(II,JJ) 

CALL set in (XYZ, I ICDGL( J) , XJ,YJ ,Z3 , SI, S2 , S3 , Z) 
ENDDO 

ENDIF 



7001 DO 7000 IDUMA-1 , LENF4 
ICLOCK-ICLOCK+1 

C 

I-INT ( vaxran ( ml ) *AL4 )-2 
C RUNS FROM 2 TO LENF-2 (VECTOR IIJDEX RUNS FROM 1 TO LENF-1 

J-I+l 

KINK-MOD ( I CLOCK ,5)^1 
C DEFINES KIND OF KINK CF THE VECTORS I- J 
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rv-icA(i) 



JV-ICA(J) 

IIV-VECT1 (IV, JV, KINK ) 

IP-I-1 

IPV-ICA(IP) 

IF(.NOT.GOODC(IPV,IIV)) GO TO 7000 

JN-J+1 

JNV-ICA(JN) 

JJV-VECT2 (IV, JV, KINK ) 

IF(.NOT.GOODC(JJV,JNV)) GO TO 7000 

CONFORMATION IS OK - CHECK THE EXCLUDE VOLUME 



REMOVE THE STRING 

JL-J 

ifirst-i 

ilast-jn 

DO K-IFIRST,ILAST 

II-ICA(K-l) 

KK-ICA(K) 

IKS1-SIDGR1(II,KK) 

IKS2-SIDGR2(II,KK) 

IKS3-SIDGR3(II,KK) 

XJ«X(K) 

YJ-Y(K) 

ZJ«Z(K) 

CALL REMOVE ( XYZ , INDGL ( K ) , X J , Y J , Z J , IKS 1 , IKS 2 , IKS 3 ) 
ENDDO 

SETIN AND EXCLUDED 
VOLUME TEST 

THE NEW VECTORS 
ICA(IFIRST)«ilv 
ICA(JL)«jjv 

IFA-IFIRST-1 

XJ-X(IFA) 

YJ-Y(IFA) 

ZJ-Z(IFA) 
DO J«IFIRST,ILAST 
II-ICA(J-l) 
JJJ-ICA(J) 

XJ-XJ+VX(II) 

YJ-YJ+VY(II) 

ZJ-ZJ+VZ(II) 

XNEW(J)-XJ 

YNEW(J)-YJ 

ZNEW(J)-ZJ 

51- SIDGR1(II,JJJ) 

52- SIDGR2(II, JJJ) 
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S3«SIDGR3(i:, JJJ) 

IF(UX>K(XYZ,INI>GL<J),XJ,YJ,ZJ,S1,S2,S3) ) THEN 

1 V THEN REMOVE AND TERMINATE 

IF(J.EO.IFIRST) GO TO 2204 
DO K-IFIRST,J-1 
KK-ICA(K-l) 
KKK-ICA(K) 



51- SIDGR1(KK,KKK) 

52- SIDGR2(KK,KKK) 

53- SIDGR3(KK,KKK) _ _ 
CALL REMOVE (XYZ, INDGL(K) , XNEW(K) , YKEW ( K ) , Z NEW ( X) , 5 - , S ~ , S J ) 

ENDDO 

2204 ICA(IFIRST)«IV 

ICA<JL)«JV 

DO I«IFIRST,ILAST 

II-ICA(I-i) 

JJJ-ICA(I) 

S1"-SIDGR1( II , JJJ) 

52- SIDGR2(II, JJJ) 

53- SIDGR3 ( II , JJJ) 

CALL setin<XYZ,IKDGLC),X;i) , Z< I ) , SI, S2 , S3 , I } 
ENDDO 

GO TO 7000 

CALL setin(XYZ,INDGL< J) ,XJ,YJ, ZJ,SI, S2 , S3 , J; 
END I? 

ENDDO 

z THE NEW STRING KEEPS EXCLUDED VOLUME 

COMPUTATION OF ENERGY OF THE NEW CONFORMATION" AND REMOVE STRING 



r 
C 



ENEW-0. 

ER-0. 

E-0. 

DO J-IFIRST,ILAST 

I- J-l 

II- ICA(I) 
JJ-ICA(J) 

XJ-XNEW(J) 
YJ-YNEW(J) 
ZJ-ZNEW(J) 

51- SIDGR1(II, JJ) 

52- SIDGR2(II, JJ) 

53- SIDGR3(II f JJ) 
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ROTATIONAL CONTRIBUTION 

JCONF- ICONF ( II , JJ ) 

ENEW-£NEW+AC(J,JCONF)+APH(II,JJ,ICA(J+l) ,J) 

INTERACTIONS OF SIDE GROUPS 

IX-XJ+Slx(ii, jj) 
IY-YJ+SlY(ii, jj) 
IZ-ZJ+Slz(ii, jj) 

E-E+ERG ( XYZ , INDGL { J ) , AM, IX , I Y , IZ , J ) 

COOPERATIVE AND HYDROGEN BOND 
E-E+EHS ( XYZ , ICA , PRODV , X J , Y J , Z J , J , AHB ) 

REPULSIVE INTERACTIONS 



ER-ER+EREPUL ( XYZ , X J , Y J , Z J , AREP ) 

CALL REMOVE ( XY Z , INDGL ( J ) , XJ , Y J , : 
ENDDO 

ENEW-ENEW+APH ( ICA( IFIRST-2) , ICA(IFA) , ICA(IFIRST) , IFA } 
ENEW- ENEW+E+ER 



COMPUTATION OF THE OLD ENERGY AND SETIN OF THE CHAIN PIECE 



THE OLD VECTORS 
ICA<IFIRST)«IV . 
ICA(JL)-JV 

SOLD-0. 

ER-0. 

E-O. 

DO J-IFIRSCILAST 

XJ«X(J) 

YJ-Y(J) 

ZJ-Z(J) 

II-ICA(J-l) 

JJJ-ICA(J) 

51- SIDGRl(ir, JJJ) 

52- SIDGR2(II,JJJ) 

53- SIDGR3(II, JJJ) 
s4-sidgr4 (ii, j jj ) 

tx-tlx(s4) 
ty«tly<s4) 

tZ-tlZ(S4) 

CALL set:in( XYZ, INDGL( J) ,XJ,YJ,ZJ,S1,S2,S3, J) 
JCONF- ICONF (II, JJ J ) 

EOLD-EOLD+AC ( J , JCONF ) -APH ( II , ZZZ , ICA( J-I ) , J) 

INTERACTIONS CF SICE GROUPS 



IX«XJ+SlX(li, jjj) 
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IY-YJ+SlY<ii,j3j> 
IZ-ZJ+Slz(ii, jjj) 

E-E+ERG(XYZ,INDGL(J) , AM, IX, IY, IZ, J) 

c COOPERATIVE AND HYDROGEN BOND 

E-E+EHB ( XYZ , ICA/ PRODV , X j , Y J , ZC , J., AHB ) 
c REPULSIVE INTERACTIONS 

ER-ER+EREPUL ( XYZ , X J , Y J , 2 J , AREP ) 

ENDDO 

EOLD-EOLD+APH ( ICA ( IFIRST- 2 ) , ICA ( IFA ) , ICA ( IFIRST ) / IFA ) 
EOLD-EOLD+E+ER 

C 

C METROPOLIS CRITERION 



C 



DE-ENEW-EOLD 

IF(EXP(-DE) .GT.vaxran(rn3)) THEN 

if lip ( iconf ( iv , j v) , kink ) -if lip ( iconf ( iv , j v ) , kink ) -1 



QKINK-OKINK+1 
ENERG-ENERG+DZ 

DO J- IFIRST, I LAST 

II-ICA(J-l) 

JJ-ICA(J) 

51- SIDGR1(II, JJ) 

52- SIDGR2(II, JJ> 

53- SIDG?.3(II, JJ) 

CALL REMOVE ( XYZ , INDGL ( J ) ,X(J) ,Y(JJ , :: Z) ,S1,S2 ,S3 ) 
ENDDO 

THE NEW VECTORS 

ICA( IFIRST) -IIV 

ICA(JL)-JJV 
DO J- IFIRST , ILAST 
XJ-XNEW(J) 
YJ-YNEW(J) 
ZJ-ZNEW(J) 
X(J)-XJ 
Y(J)«YJ 
Z<J)«ZJ 
II-ICA(J-l) 
JJ-ICA(J) 

51- SIDGR1(II,JJ) 

52- SIDGR2(II, JJ) 
S3«SIDGR3(i:, JJ) 

CALL setin ( XYZ , INDGL( J ) , XJ , YJ , ZJ , SI , S2 , £3 , J) 
ENDDO 

ENDIF 
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7000 CONTINUE 
c idummy-1 

c if(idummy .eq. 1) go to 7770 

C END FLIPS (TWO BONDS REARANGEMENTS ) 

C 

C N-TERHINUS (TAIL) 

C 

JV3«ICA(3) 

60 NV2-INT(vaxran(rnl)*24.) + l 
IF(.NOT.GOODC(NV2,JV3)) GO TO 60 

61 NVl-INT(vaxran(rn3)*24. )+l 
IF(.NOT.GOODC(NVl,NV2)) GO TO 61 

1 CONFORMATION IS OK. CHECK THE EXCLUDED VOLUME 

CALL REMOVE(XYZ,INIX^<l),X(l)/Y(l),Z(l),13,13,13) 

ICAl-ICA(l) 
ICA2-ICA(2) 

PK21-SIDGR1 ( ICA1 , ICA2 ) 
PK22-SIDGR2 ( ICA1 , ICA2 ) 
PK23-SIDGR3 ( ICA1 , ICA2 ) 



CALL REMOVE ( XYZ , INDGL ( 2 ) , X ( 2 ) , Y ( 2 ) , Z ( 2 ) , PK21 , PK2 2 , F K2 3 ) 
CHECK THE ROTATION Or SIDE GROUP ON THIRD BEAD 



8/19/89 

oninvoke if no glycines are he 

if(indal(3) .eo.O) go to 3040 
PK31-SIDGR1(ICA2,JV3) 
PK32-SIDGR2 ( ICA2 , JV3 ) 
PK33-SIDGR3 ( ICA2 / JV3 ) 



*************************** 

SX1-X(3)+STLX(PK31) 

SX2«X(3)+STLX(PK32) 

SX3-X(3)+STLX(PK33) 

SY1-Y(3)+STLY(PK31) 

SY2-Y(3)+STLY(PK32) 

SY3-Y(3)+STLY(PK33) 

SZ1-Z(3)+STLZ(PK31> 

SZ2-Z(3)+STLZ(PK32) 

SZ3-Z(3)+STLZ(PK33) 

XYZ<SXl,SYl,SZl)-0 

XYZ(SX2,SY2,SZ2)-0 

XYZ(SX3,SY3,SZ3)-0 

xone- ( sxl+sx2+sx3-x ( 3 ) ) /2 
yone«(syl^sy2+sy3-y{3 ) )/2 
2one-(szl+sz2^sz3-z(3 ) )/2 
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xyz ( xone , yone , zone ) -0 

NK3 1-SIDGR1 ( NV2 , JV3 ) 

NK3 2-SIDGR2 ( NV2 , JV3 ) 

NK33-SIDGR3 (NV7 , JV3 ) 
c s4-sidgr4 ( nv2 , j v3 ) 

c tx-tlx(s4) 
c ty«tly(s4) 
c tz-tlz(s4) 

c It******************** 

MX1-X(3)+STLX(NK31) 

MX2-X<3)+STLX(NK32) 

MX3-X(3)+STLX(NK33) 

KYI- Y ( 3 ) +STLY (NK31) 

KY2-Y(3)+STLY(NK32) 

KY3-Y(3)+STLY(NK33) 

KZ1-Z ( 3 ) +STLZ ( NK3 1 ) 

KZ2-Z ( 3 ) +STLZ ( NK3 2 ) 

MZ3-Z(3)+STLZ(NK33) 

IF(XYZ(MX1,MY1,MZ1).NE.0) GO TO 64 

IF(XYZ(MX2,HY2,KZ2).NE.O) GO TO 64 

IF(XYZ(KX3,MY3,KZ3).NE.O) GO TO 64 
mxone-(mxl+mx2+mx3-x( 3) )/2 
niyone-(myl+my2+rny3-y(3) )/2 
mzone-(mzl+mz2+mz3-z(3) )/2 
if (xyz(mxone,myone,nizone) .ne.O) go to 64 

XYZ(KX1,KY1,KZ1)— 1 
XYZ (MX 2 ,MY2/MZ2) —1 
yr£Z(yji3 t Kf3 ,m3)'~ 1 
xyz ( mxone , myone , mzone ) »3 



z end of check of sidechain conformation if sidechain there is not 

c a glycine. 

3040 continue 

NK21-SIDGR1 ( KV1 , KV2 ) 

NX22-SIDGR2 { NV1 , NV2 ) 

NK23-SIDGR3 ( NV1 , NV2 ) 
c **************************** 

WX2-VX(NV7) 

WY2-VY(NV2) 

WZ2-VZCNV2) 
X2-X(3)-WX2 
Y2-Y(3)-WY2 
Z2-Z ( 3 ) "-WZ2 

IF(LOOK(XYZ / INDGL(2),X2,Y2,Z2,NK21,NK22 / NK23) ) GO TO 63 
WXl-VX(NVl) 
WYl-VY(NVl) 
WZl-VZ(NVl) 

xi-x2-wxi 
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Y1-Y2-WY1 

IF(L00K(XYZ,INDGL(1),X1,Y1, 21,13,13, 13)) GO 70 63 

.OLD CONFORMATIONAL ENERGY (LOCAL) 

IC3-ICONF(ICA2,JV3) 

IC2-ICONF ( ICA1 , ICA2 ) 

COLD-AC( 2 , IC2 ) +AC( 3 , IC3 ) 

s4-sidgr4 ( ical , ica2 ) 

tx-tlx(s4) 

ty-tly(s4) 

tz-tlz(s4) 

PK23-SIDGR3 ( ICA1 , ICA2 ) 
ipk21-ihanl ( ical , ica2 ) 
QX1-X ( 2 ) +slx ( ical , ica2 ) 
Qyl-y ( 2 ) +sly ( ical , ica2 ) 
Qzl-z ( 2 ) +slz ( ical , ica2 ) 

SuXl-X ( 3 )+SlX ( ica2 , j v3 ) 
Suvl-y ( 3 )+sly ( ica2 , jv3 ) 
Suzl-z { 3 ) -f-Slz ( ica2 , j v3 ) 

EOLD*COLD 

+ERG ( XYZ , INDGL( 3 ) , AM, SuXl , SuYl , SuZl ,3) 

+ERG ( XYZ , IKDGL ( 3 ) , AM , SuX3 , SuY3 , SuZ3 , 3 } 
+ERG ( XYZ , INDGL ( 2 ) , AM , QX1 , QY1 , CZ1 , 2 ) 



+APH( ICA1, ICA2, JV3 , 2)+APH(ICA2, JV3 , ICA(4 ) , 3 ) 
+EREPUL(XYZ,X(2) ,Y(2),Z(2) ,AREP) 
+EHB(XYZ,ICA,PR0:>V,X(3),Y(3) ,Z(3) ,3,AK3) 
+EHB(XYZ,ICA,PR0DV,X(2),Y{2) ,Z(2) ,2,AK3) 



. NEW CONFORMATIONAL ENERGY (LOCAL) 

ICA(1)-NV1 
ICA(2)-NV2 

IC3-ICONF(NV2,JV3) 
IC2-IC0NF(NV1,NV2) 
CNEW-AC< 2 , IC2 ) +AC ( 3 , IC3 ) 

s4-sidgr4 ( nvl , nv2 ) 

tX»tlx(S4) 

ty-rly(s4) 
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rz-tiz(s4 ) 

LX1-X2+S1X ( nvl > nv2 ) 
Lyl-y2*Sly (nvl , nv2 ) 
Lzl-22+Slz ( nvl , nv2 ) 

MuXl-X ( 3 ) +S1X ( nv2 , j v3 ) 
Muyl-y { 3 )+Sly (nv2 , j v3 ) 
ttuzl-z ( 3 ) +Slz ( nv2 , j v3 ) 



* ENEV7-CKEW 

+ERG ( XYZ , INDGL< 3 ) , AM, muXl , rcuYl , KuZ^ , j ) 

+ERG ( XYZ , INDGL ( 2 ) , AM , LX1 , LY1 , LI1 , 2 } 

-APH ( NV1 , KV2 , JV3 , 2 ) + APK ( KV2 , JV3 , I CA ( 4 ) , 3 ) 
-EREPUL{ XYZ , X2 , Y2 , Z2 , AREP ) 
-EK3(XYZ,ICA,PRODV,X;3),Y{3) ,Z(3),3,AH5) 
^EK3(XYZ,ICA,PRODV,X2,Y2,Z2,2,AH3) 
METROPOLIS CRITERION 



DE-ENEW-EOLD 

IF (EX? (-DE) .LT . vaxran ( rn3 ) ) GO TO 63 
ENERG-ENERG-DE 

SET-IN THE NEK CONFORMATION OF THE TAIL 

X{1)-X1 
Y(1)-Y1 
Z(1)-Z1 
X(2)-X2 
Y(2)-Y2 
Z(2)«Z2 

CALL setin(XYZ,INDGL{l) , XI, Yl , Zl , 13 , 13 , 13 . 
CALL SETIN ( XYZ , INDGL{ 2 ) , X2 , Y2 . ZZ , NK21 , NK22 , NK23 , Z ; 
OEND=QEND-I 
GO TO 79 

SET-IN THE CLO CONFORMATION OF THE TAIL 
if(incclf3; .eq. Z) go -c 641 

xyz { Kxi , xYi , mz i > — : 

XYZ ( XX2 , EY2 , MZ 2 ) - G 
XYZ(KX3,KY3,MZ3)=0 



xyz ( mxone , rnyone * mzcr.e ) «=0 

64 XYZ ( SX1 , SY1 , S Zl ) — I 

XYZ(SX2,SY2,SZ2)~ 1 
XYZ(SX3,SY3,SZ3>— 1 
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xyz ( sone , yone , zone ) -3 
641 continue 

ICA(1)-ICA1 
ICA(2)-ICA2 

CALL setin(XYZ,INDGL(l),X(l),Y(l),Z(l) ,13,12, 13,1) 
CALL SETIN(XYZ,INDGL(2) / X(2),Y (2)/ z<2),PK2l)pK22,?K23,2) 



C-TERMINUS (HEAD) 

79 JV3-ICA{LENF3) 

80 NV2«INT(vaxran 
IF(.N0T.G00DC(JV3,NV2)) GO TO 80 

81 NVl-INT(vaxran( 



NV2«INT(vaxran(rnl)*24. )+l 
IF(.N0T.G00DC(JV3,KV2)) GO TO 
NVl-INT(Vaxran(rn2)*24. )+l 
IF(.N0T.G00DC(NV2,NV1)) GO TO 81 

CONFORMATION IS OK. CHECK THE EXCLUDED VC'-.' 
CALL REMOVE ( XYZ , INDGL ( LENT ) , X ( LEKF ) , Y ( LENT ; , Z ; LENT ' 71 3 
ICA2-ICA(LENF2) 



ICA2-ICA(LENF2) 
ICAl-ICA(LENTl) 
PK21-SIDGR1 ( ICA2 , ICA1 ) 
PK22-SIDGR2 ( ICA2 , ICA1 ) 
PK23-SIDGR3 ( ICA2 , ICA1 ) 

IIII-INDGL(LENFl) 

CALL REM0VE(XYZ,IIII,X(LENT1) ,Y(LE*~ ) t Z['- % ~ ' - 
CHECK THE ROTATION OF SIDE GROUP CN" THIRD 
if (indgl(lenf2) .ec. C) co to 6045 
PK31-SIDGR1 ( JV3 , ICA2 ) 
PK3 2-SIDGR2 ( JV3 , ICA2 ) 
PK33-SIDGR3 ( JV3 , ICA2 ) 

SX1-X ( LENT2 ) +STLX ( PK3 1 ; 

SY1-Y ( LENT 2 ) +STLY (PK31 \ 

S Zl« Z ( LENF2 ) +STLZ ( PK3 1 ) 

SX2-X ( LENT2 ) +STLX ( PK3 2 ] 

SY2-Y ( LENT2 ) +STLY ( PK3 2 ; 

SZ2=Z (LENF2 )-STLZ ( PK3 2 1 

SX3 -X ( LENT 2 ) +STLX (?K33; 

SY3-Y ( LENF2 ) +STLY ( ?K3 3 

SZ3-Z ( LENT2 ) +STLZ ( PK3 3 : 

XYZ(SX1,SY1,SZ1)=C 

XYZ(SX2,SY2,SZ2)«0 

XYZ(SX3,SY3,SZ3)-0 

xone«(sxl+sx2+sx3-x(ie:;f25 )/2 
yone- ( syl^sy 2-sy3-v ( leaf 2 ; ; /2 
zone-(szl*sz2+sz3-z{lenf2; )/2 
xyz ( xone , yone , zone ) -0 

NK3 1-SIDGR1 ( JV3 , KV2 ) 
NK3 2-SIDGR2 ( JV3 , NV2 ) 



KK33-SIDGR3 ( JV3 , ^^v'2 ) 
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KX1-X ( LENF2 ) + STLX ( NK3 1 ) 
MY1-Y ( LENF2 ) +STLY ( NK3 1 ) 
KZ1-Z (LENF2 )+STL2 ( NK31 ) 
MX2-X ( LENF2 ) +STLX (NK32 ) 
KY2-Y ( LENF2 ) +STLY ( KK3 2 ) 
MZ2-Z ( LENF2 ) +STLZ ( NK3 2 ) 
MXJ-X(LENF2)+STLX(NK33> 
KY3-Y (LENF2 ) +STLY ( NK3 3 ) 
KZ3-Z ( LENF2 ) +STLZ (NK33) 

IF(XYZ(MX1,MY1,MZ1).NE.0) GO TO 84 

IF(XYZ(MX2,MY2,KZ2).NE.O) GO TO 84 

IF(XYZ(MX3,MY3,KZ3).NE.O) GO TO 84 

mxone- ( mxl+mx2+mx3-x ( lenf 2 ) ) /2 
myone-(myl+my2+my3-y(lenf2))/2 
mzone- (mzl+mz2+ioz3-z ( lenf 2 ) ) /2 
if (xyz(rraone, myone , mzone ) . r.e. 0 ) go rc 84 

XYZ(MX1,MY1,MZ1) — 1 

XYZ ( MX 2 ,MY2/MZ2 ) — 1 

XYZ(MX3,KY3,KZ3)~1 

xyz ( raxone , myone , mzone ) -lenf 2 



6045 continue 

NK21-SIDGR1 ( NV2 , KV1 ) 
NK22-SIDGR2 ( KV2 , KV1 ) 
NK23-SIDGR3 ( NV2 , NV1 ) 

WX2-VX(NV2) 

WY2-VY(KV2) 

WZ2-VZ(NV2) 
X2«X(LEKF2)+WX2 
Y2-Y(LENF2)+WY2 
Z2-Z(LENF2)+WZ2 

IF (LOOK ( XYZ , I III , X2 , Y2 , Z2 , NK21 , N*K22 , >CK23 ) ) GO S3 
WXl-VX(NVl) 
VrYI-VY(NVl) 
WZl-VZ(NVl) 

xi-x2- l wx: 

Y1-Y2+WYL 
Z1-Z2*WZ1 

zr (look ( xyz , ::ogl ;iz.\*r ;,x:,y:,z:,:3,13,i:;; c-: r: s: 



c old conformational energy ( local) 

IC3-ICONF(JV3, ICA2) 

IC2-ICONF( ICA2 , ICA1 ) 

COLD-AC (LENF1 , IC2 )+AC (LENF2 , IC3 ) 

c s4-sidgr4 ( ica2 , ical ) 

c tx-tlx(s4) 

c ty-tly(s4) 

c tz-tlz(s4) 

QX1-X ( lenf 1 ) +S1X ( ica2 , ical ) 
Ovl-y ( lenf 1 ) ■■■Sly ( ica2 , ical ) 
Czl-z ( lenf 1 ) x Slz ; ica2 , ical ) 
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SuXl-X(LENF2 )+SlX< j v3 , ica2 ) 
Suvl-y (LENT 2 ) +Sly ( j v3 , ica2 ) 
Suzl-z (LENF2 ) +S12 ( j v3 , ica2 ) 



CX1-X ( LEKF1 ) +STLX ( PK21) * 
QY1-Y ( LENF1 ) +STLY ( PK21 ) 
0Z1-Z ( LENF1 ) +STLZ ( PK21 ) 
QX2-X ( LENTI ) ^STLX (PK22 ) 
QY2-Y ( LENTI ) +STLY ( PK2 2 ) 
QZ2-Z(LENF1)+STLZ(PK22) 
0X3 -X ( LENTI ) +STLX ( PK2 3 ) 
CY3-Y (LENTI ) +STLY ( PK2 3 ) 
QZ3~Z( LENTI ) +STLZ (PK23 ) 
ECID-COLD 

-erg ( xyz , ineglcleiti ) , ax, suxi . suvi , suii , le:t: 



-APH ( ICA ( LENT 4 ) , JV3 , ICA2 , LENTS ) 

~APH (JV3,ICA2, ICA1 , LENT 2 ) 

~ERG ( XYZ , I ZZ Z , AM, CXI , QY1 , QZ1 . LENT! ; 



-EREPUL( XYZ , X ( LENTI ) , Y ( LENTI ) , Z ( LENTI ) , ARE? v 

~EHB ( XYZ , ICA, FRODY , X ( LENT 2 ) , Y( LENT 2 ) , Z ( LENFI ; , LENT 2 , A 

-EH3 ( XYZ , ICA , PROD" , X ( LENTI ) , YJ LENTI ) ,I( LENTI \ , LENTI , A 

. NEW CONFORMATIONAL ENERGY (LOCAL) 

ICA(LENF1)«NV1 

ICA(LENT2)«NV2 

IC3«ICONT(JV3,NV2) 
IC2=ICONF(NV2 , KV1 ) 
CNTW-AC ( LENTI , IC2 ) +AC ; LENT! , IC3 * 

LXl«X2+slx { nv2 , nvl ) 
Ly 1-y 2 J -sly ( nv2 , nvl ) 
Lzl-z2*slz ( nv2 , nvl ) 



y.uXl-X ( lenf 2 ) +Slx ( j v3 ( nv2 5 
Xuyl-y ( lenf 2 ) +Sly ( j v3 , nv2 } 
>luzl-z( lenf 2 ) +Slz ( jv3 , nv2 ) 



LXl-X2- L SrLX(NK21) 
LY1-Y2+SSLY(NK21) 
LZ1-Z2+STLZ(NK21) 
LX2-X2+STLX ( NK22 ) 
LY2-Y2+STLY ( KK22 ) 
LZ2-Z2+STLZ(KK22) 
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LX3«X2-ST^X(NK23) 



c LY3-Y2+SHY{KK23) 
c LZ3-Z2+STLZ (NK23) 

ENEW-CNEW 

ir +ERG ( XYZ , INDGL ( LENT2 ) , AM, ir.uX 1 , znuYl , muZl , LENT 2 ) 

-APH ( ICA ( LENT 4 ) , JV3 , KV2 , LENT 3 ) 

-APH ( JV3 , KV2 , KYI , LENT 2 ) 

-ERG ( XYZ , ZZZ I , AM , LXI , LY1 , LL1 , LENT1 ; 

-EREPUL( XYZ , X2 , Y2 , Z2 , AREP ) 

-EH3 ( XYZ , ICA , PROOV, X ( LENT 2 ) , Y( LENT 2 ) , Z [ LENT 2 ) , LENT 2 , AH 5 ) 
-EH3 ( XYZ , ICA , PROOV, X2 , Y2 , Z2 , LENT I , AH 5 ) 



C METROPOLIS CRITERION 

n 

DE-ENEW-EOLD 

IF(EXP(-DS) ,LT.vaxran(rn3)) GO TO 83 
ENERG-ENERG +DZ 

C SET-IK THE KE* CCKFORMATICN C? THE HE AO 

X( LENT) »XI 
Y(LENF)«Y1 
Z(LENF)-Z1 
X(LEITF1)-X2 
Y(LENF1)«Y2 
Z(LENF1)«Z2 

CALL set in ( XYZ , INDGL { LENT ) , XI , Yl , Z 1 , 13 , 13 , 1 3 , 1 ) 
CALL SETIN ( XYZ , IIII , X2 , Y2 , Z2 , KK21 , NT 2 2 , NK23 , LENT1 ; 
QEND-QEND-1 
GO TO 7007 

C SET-IN THE CLD CONFORMATION OF THE TAIL 

83 if (indgl(lenf2) .ec. 0) go ro 675 

XYZ(KXl,KYl,MZl)-0 

XYZ(KX2,KY2,KZ2)-0 

XYZ (MX 3 ,MY3 ,MZ3 )-0 

xy z ( mxone , myone , mzone ) - 0 

84 XYZ(SX1,SY1,SZ1) — 1 

XYZ(SX2,SY2,SZ2)— 1 
XYZ(SX3,SY3,SZ3) — I 
xyz(xone,yone,zone)*lenf2 
€75 continue 
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ICA(LENF1)*ICA1 
ICA(LENF2)-ICA2 

set-in ( XYZ / INDGL ( LENF ) ,X(LENF) , Y(LENF) , Z(LENF) , 13 , 13 , 13 , 1 ) 
CALL SET IK ( XYZ ,1111, X (LENF1 ) , Y( LENT1 ) ,Z( LENF1 ) , PK2 1,?K22, FK23 , LEKF1 ) 



WAVE LIKE MOTION OF THE CHAIN* FRAGMENT, VARIOUS CONFORMATIONS 



I- INT ( AL9 ' vaxran (zr.2) )-3 
if ( vaxran(r::3) .cr. . CI) rhen 
af-I.dO 
else 
af -ah 
end if 

1-2 IS THE CENTRAL 3EA3 OF THE PIECE TC BE 
Cj - IS THE CENTRAL ONE CF THE PIEIE TC 3E CC 
SEARCH FOR U-SKAPEE { CF VARIOUS WITCH) CONFORMATIONS 

IV2-ICA{I) 
IV5-ICA(I-3) 
VX2-VX(IV2) 
YX5-VX(IV5) 

IF(VX2.NE.-VX5) GO TC 77" C 

VY2-VY(IV2} 

YY5=VY(IV5) 

™(vy2.ne.-yy5; gz to — : 

VZ2-VZ{IV2: 

vz5«vz(TV5; 

"(VZ2.NE.-VZ5) GZ TC : 

LCIK FCR THE SZZZl^ E:~ 

IVA-X03 ( I CLOCK , WA7EL ; 
MIX— MIX 

JJ-I-2-MIX- ( 5-IVA) 

IF( JJ.LT.4 .OR. JJ.GT.LENF5 ■ GO TO 777Z 

ACZErTZC COV7N THE CHAIN CHOICE 
I -END CONSTRUCTION (CUT-OFF) 

IV3«ICA(I*2} 
IV4«ICA(I^1) 

KINK FERFORMEEEE (KINK--1) 

IVl-ICA(I-I) 
I4-I+4 
IV6-ICA(I4 ) 

IF ( G002C ( IV1 , IV3 ) . ANT . COCCI ; IV4 , IV6 ) ) GO TO 2C0 

ELSE TRY KINK FLIP CF THE TOP 

INV3-VECT1 ( IV2 , IV4 , KINK ) 
IV4 -VECT2 ( IV3 , IV 4 , KINK ) 
IV3~INV3 

IF ( . NOT . GOOCC ( "1 , 173 ) ) GO TO 7773 
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i?( .not.goodc( :v4 , ive ) ) go to ttto 

CONSTRUCT THE NEW Jj- END 

200 Jl-JJ-1 

JVl-ICA(Jl) 

JV2-ICA(JJ) 

JVL-ICA(JJ-2) 

J3-JJ+1 

JVP«ICA(J3) 
2C1 V-INT(vaxran(rnl)*24 . )^1 

IF(.NOT.GOODC(JVL,V)) GO TO 2C1 

WVX-VX(V) 

WVY-VY(V) 



KVZ-VZ (V) 

VM-VECTOR ( -WVX , -WW , - KVZ ) 

I?( .NOT.GOODC(VM,JV?) ) GO T2 201 

DO KINK-1,5 

jn1-vect1 ( jv1 , "2 , kink ) 

if ( goodc ( v , jk1 ) ) then 

jn2-vec72 ( "1 , jv2 , kink ) 
if(Gcodo;jn2,vk) ) go to 202 

END IF 

ENDDO 
GO TO 7770 

MOD I F F I CAT I ON Or THE SCNT STRING ARRAY "A. 



"(MIX.GT.C) THEN 
IFIRST=I 
iL^SI-ji 
ELSE 

IFIRST-J1 
ILAST-14 
END IF 

DO J-JFIRST-1,I1AS7 
ICAO( J)-ICA(J) 
ENDDO 



ICA(I)-~2 
ZCA(I*1)-IV4 
DO J* 14 , JJ-2 
ICA(J-2)=ICAO{ J) 
ENDDO 

ica(jj)«v>: 

ICA(J1)-JN2 

ica(ji-d-j:;i 

ICA(JI-2)-V 
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ELSE 



EKD IF 



ICA(I4-1)-IV4 

ICA(I4-2)-IV3 

DO J-J3,I-1 

ICA(J-2)-ICAO(J) 

ENDDO 

ICA{J1)-V 

ica(jj)-jk: 

ICA(J3)-JK2 
ICA(J3-M)-VM 



REMOVE THE STRING 

DO K~IFIRST,ILAST 

II-ICAO(K-l) 

KK-ICAO(K) 

IKS1-SIDGRI(II,KK) 

IKS2-SIDGR2(II,KK) 

IKS3«SIDGR3(II,KK) 

XJ-X(K) 



YJ-Y(K) 
ZJ=Z(K) 

CALL REMOVE ( XYZ , INDSL ( K ) , XT , YJ , Z Z , IKS I , IKS 2 , IKS 2 \ 
EOTDO 

i set::; am: extllted 

C VOLYME TEST 

IFA-IFIRST-1 

XJ-X(IFA) 

YJ-Y(IFA) 

ZJ»Z(IFA: 
DO J~IFIRST,ILAS7 
II-ICA(J-I) 
JJJ-ICA(J) 

XJ«XJ+VX(II) 

YJ-YJ+VY(II) 

ZJ-ZJ+VZ(II) 

XNEW( J)-XJ 

YNEW<J)-YJ 

ZKEW(J)-ZJ 
IKS1-SIDGR1CI, JJJ) 
:KS2«SIDGR2(i:,JJJ) 
IKS3-SIDGR3(i:,jJJ) 

IF ( LOCK ( XYI , i:^3L ( Z J , XJ , YT . ZZ , IKS1 , IKS I . IKS3 ) \ THEN 

- the:; remove a:;: terminate 

IFCJ.EQ.IFIRST; GO TO 204 
DO K«IFIRST,J-1 
KK-ICA(K-l: 
KKK-ICA(K) 
IKSI-SIDGR1 ( KK , KKK ) 



r 
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IKS2-SIDGR2 ( KK , KKK ) 

IKS3-SIDGR3(KK,KKK) 

CALL REMOVE { XYZ , INDGL ( K ) , XNEW (K) , YNEW ( K ) , ZNEW ( K ) , * *hS^ • -KS j , 
ENDDO 

204 DO I-IFIRST,ILAST 

ICA(I)-ICAOd) 
II-ICA(I-l) 



JJJ-ICA(I) 
IKSl-SIDGRKII, JJJ> 
IKS2-SIDGR2{II,JJJ> 
IXS3-SIDGR3(II,JJJ) 
CALL seU.n(XrZ,INDGL{I),X(I>,Y(I),ZC),IKSl,:KS2,=KS3,i; 
ENDDO 

GO TO 7770 
ELSE 

SET NEW BEAD 

CALL SETIN ( XY Z , IND3L ( J ) , Xj , Y J , Z J , IKS 1 , IKS I . IKS 3 , J ) 
END IF 

THE NE* STRING KEEPS £.Xw_w-i3 *w^.-.i 



COMPUTATION' OF ENERGY 0? THE NE'% CCNr ORXACICN A::D = iMC\i S.r.. 



ENEW-O. 

ER-O. 

E-0. 

DO J-IFIRST,ILAST 

I- J-l 

II- ICA(I) 
JJ-ICA(J) 

XJ-XNEW(J) 
YJ-YNEW(J) 
ZJ-ZNEW(J) 
IKS1-SIDGR1(II,Jj) 
IKS2-SIDGR2(II,JJ) 
IKS3-SIDGR3(II,JJ) 



ROTATIONAL CONTRIBUTION 

JCONF- I CON? { 1 1 , J J ) 

EKEw- EN Evv-r AC ( J , JCONF ) -APH ( I I , UJ , ICA ( J-1 ) , J ) 

INTERACTIONS OF SICE GROLr-: 

IXl-XJ*SlX«ii, jj) 
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E-E^ERG ( XYZ , INDGL (J) , AX, 1X1/ IY1, IZ1, J ) 

C COOPERATIVE AND HYDROGEN BONO 

E-E+EHB ( XYZ , ICA , PRODV , XJ , YJ , Z J , J , AHB ) 
C REPULSIVE INTERACTIONS 

ER-ER+EREPUL ( XYZ , XJ , YJ , Z J , AREP ) 

CALL REMOVE { XYZ , IKDGL ( J ) , XJ , YJ , Z J , IKS1 , IKS2 , IKS3 ) 

ENDDO 

ENEW-ENEW+APH ( ICA( IFIRST-2 ) , ICA( IFA) , ICA( IFIRST) , IFA ) 
ENEW-ENEW+E+ER 



C COMPUTATION OF THE CLD ENERGY AND SET IN OF THE CHAIN PIECE 

DO J-IFIRST, ILAST 

I-ICA(J) 

ICA(J)«ICAO(J) 

ICAO( J)-I 

ENDDO 

Z NEK ICA STORED IN ICAO AT THIS PCINT 

EOLD-0. 
ER-O. 
E-0. 

DO J* IFIRST, I LAST 

XJ-X(J) 

YJ-Y(J) 

ZJ«Z(J) 

II-ICA(J-l) 

JJJ-ICA(J) 



IKS 1- S IDGR1 C I Z , J J J ) 

IKS2-SIDGR2JII,JJJ) 

IKS 3- SIDGR3 (II, JJJ ) 
z s4-sidgr4(ii, jjj) 

c tx-tlx(s4) 
c ty«tly(s4) 
c tz-tlz(s4) 

CALL setin (XYZ , INDGL (J ) , XJ , YJ , Z J , IXSi , IKS 2 , IKS 3 , J ) 

JCONF-ICONF (II, JJJ) 

EOLD-EOLD+AC ( J , JCONF ) * APH (II, JJJ , ICA ( J+l ) , J) 
C INTERACTIONS OF SIDE GROUPS 

lxi-XJ*Slx(ii, jjj) 
lyi-yJ+Sly(ii,j;:) 
Izl-zJ+Slz(ii, j jj) 

E-E^ERG ( XYZ , INDGL ( J ) , AX , 1X1 , IY1 , IZ1 , J ) 



COOPERATIVE AND KYTROGEN BOND 
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E-E-EHB ( XYZ , ICA, PRODV , XJ , V J , Z J , J , AH3 ) 

REPULSIVE INTERACTION'S 

ER-ER+EREPUL ( XYZ , XJ , Y J , Z J , AREP ) 
ENDDO 

EOLD-EOLEH-APH ( ICA ( IFIRST-2 ) , ICA ( IFA ) , ICA ( IFIRST ) , IFA) 
EOLD-EOLD+E+ER 



METROPOLIS CRITERION 
DE-ENEW-EOLD 

IF(EXP(-DE*af ) .GT.vaxran(rn3) ) THEN* 



OWAVE-OWAVE+1 
ENERG-ENERG+DE 

DO J- IFIRST, ILAST 

II-ICA(J-l) 

JJ-ICA(J) 

IKS1-£IDGR1(II,JJ) 

IKS2«SIDGR2(II,JJ) 

IKS3-SIDGR3(II,JJ) 
CALL REMOVE ( XYZ , INDGL ( J ) , X ( J ) , Y { J ) , Z { J ; , 

ENDDO 
DO J- IFIRST, ILAST 
XJ-XNEW(J) 
YJ-YNEW(J) 
ZJ-ZNEW(J) 
X(J)-XJ 
Y(J)-YJ 
Z(J)~ZJ 
II-ICAO(J-l) 
JJ-ICAO(J) 

IKS1-SIDGR1(II,JJ) 

IKS2-SID3R2(II,JJ) 

IKS3-SIDGR3(II,JJ) 
CALL set in ( XYZ , i:CDGL ( J ) , XJ , YJ , ZZ , IKS 1 , 1; 
ICA(J)»ICAO(J) 



ENDDO 

ENDIF 
7770 . CONTINUE 

OPTIONAL NORMALISATION 0? THE COORDINATES 

SX-0 
SY-0 
SZ-0 

DO I-1,LENF 
SX«SX A X(I) 
SY-SY*Y(I) 
SZ-SZ+Z(I) 
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zt(i)-z(i)-szc 
end do 

vrite(13,*)(xt(i) ,yt(i) ,zt(i) ,i-l,lenf ) 

R2-(X(LENF)-X(1))**2+(Y(LEMF)-Y(1))**2+(Z(LENF)-Z(1)}**2 

asumr2-asumr2+r2 

etot2-etot2+energ*energ 

etot-etot+energ 

AS2-0. 

SX-0 

SY-0 

SZ-0 

DO I-1,LENF 

SX-SX+X(I) 

SY-SY+Y(I) 

SZ-SZ+Z(I) 

ENDDO 

C CENTRE 0? GRAVITY COORDINATES 

AS X-= FLOAT ( SX ) /LENT 
ASY-FLOAT ( SY ) /LENT 
AS Z- FLOAT ( S Z ) /LENT 

DO I -1, LENT 

BX-(ASX-X(I))**2 
BY-(ASY-Y(I))**2 
BZ-(ASZ-Z(I) )**2 
AS2-AS2*BX-H3Y+BZ 
ENDDO 

AS 2 -AS 2 /LENF 
asums 2- a sums 2 + a s 2 
c insertion cf the native contact pairs 

nt-0 . dO 
nct-0 . 

do 1400 i-2,ler.f 2 
' k-i-I 

ii- ica(i-l) 

iii- ica(i) 

SXl-slx ( ii / iii ) -x ( i ; 
Syl-sly ( ii , iii )-y(L) 
Szl-slz ( ii , iii ) +z { i ) 

DO 1400 J-K , LENT1 

c it is not counting the nearestl down-the-chain neighbours . vhich may be 

c use full for some purposes 

if (iabs(i-j ) .ea.l) co to 1400 

iR2-(X(J)-X(I))»*2-s-(Y(J)-Y(I))»*2-(Z(J)-Z<:j>«2 
IF(iR2.GT.18) GO TO 1400 

ri-iCA(j-i) 



III-ICA(J) 
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kxi«six<ii,i::)+x(J) 
kyi°siy(II,iid+y(J) 
kz1«s1z(ii,iii)+z(j> 

Rll- ( SX1-KX1 ) * * 2+ { SY1-KY1 ) * * 2+ ( SZ1-KZ1 ) * * 2 
IF(R11.EQ.2) then 
nct°nct+inc(i, j) 
nt-nt+1 
end if 
1400 CONTINUE 

ant°ant+nt 
anct-anct+nct 

WRITE(6,8009) I TERM/ R2, AS2, ENERG,nct,nt 
8009 FORMAT(2X,2I5,F8.2,F10.4,3x,i3,2x,i3) 

C 

7777 CONTINUE 

" 900 o CONTINUE 
REWIND (UNIT-10) 
WRITE (10,8000) LENF 
DO 1=1, LENF 

WRITE (10, 8000) X(I),Y<I),C£I) 
ENDDO 

c DO I»2,LENF1 

2 WRITE(6,8000) I, X( I ) , Y( I ) , Z ( I ) , ICA( I ) , ICONF( ICA( Z-l;,:CA< I) ) 

C ENDDO 



C ACCEPTANCE RATIOS FOR VARIOUS MOVES 

FKINK-FLOAT ( QKINK ) /FLOAT ( NCYCLE* PHOT ) /AL4/100 . 

FWAVE=FLOAT ( QWAVE ) /FLOAT ( NCYCLE* PHOT ) /10 0 . 

FROT 63 FLOAT ( QROT ) /FLOAT ( NCYCLE* PHOT ) /l 00 . 

FEND= FLOAT ( OEND ) /FLOAT ( NCYCLE * PHOT ) /2 00 . 

etot-etot/ncycie 

etot2-etot2/ncycle 

cv°etot2-etot*etot 

asumr2°asumr2/ncycle 

asums2°asums2/ncycle 

anct=anct/ncycie 
ant=ant/ncycle 
. write( 6 ,9942) etot , etot2 , cv, asumr2 , asurcs2 , anct , ant 
9942 format(lx, , <E>= ( ,1F10.4,5X, ' <E2>° ' ,lPdl5 . 8 , 5X , 1 CV» * .Ipdlr.E,/, 
•mean-square-end to end vector** ' ,lpdl 5 . e,/, 

• <S2>« ' ,lpdl5. 8,/, 'native contacts- ' , lpd!5 . E ,/, 
* ■ number of contacts**' ,lpdl5.8,/) 

vrite<6,8012) 

format (lx, ' fkink- fend ° fvave- f rcr= ' ) 

WRITE(6,8002) FKINK , FEND , FWAVE , FROT 

DETAILED ANALYSIS OF THE CHAIN STRUCTURE 
C 



3012 
C 
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write ( 16 , 181 ) atemp , etot , cv , asuinr 2 , asu,-ns2 , anct , ant 
iei format( lx , If B . 3 , 2x, 6 ( If 8 . 3 , 2x) ) 

WRITE (6, 8004) 

8004 FORMAT { IX, //,8X,'VECT0R1 VECT0R2 ' , 8x, 1 SIDE 1/2/3 tx, ' KANDSNES ' ,/> 

DO I-2,LENF1 
II-ICA(I-l) 
JJ-ICA(I) 
KK-ICACI+1) 

Icj-ICONF(II,JJ) 

S ID1 - S IDGR1 (II, JJ) 
SID2«SIDGR2(II,JJ) 
S ID3-SIDGR3 (II, JJ) 



WXl-VX(II) 
WYl-VY(II) 
WZl-VZ(II) 
WX2-VX(JJ) 
WY2-VY(JJ) 
WZ2-VZ<JJ) 
WX3-VX(KK) 
WY3-VY(KK) 
WZ3-VZ(KK) 

R3- ( WX1+WX2+WX3 )**2^( WY1+WY2+WY3 ) * * 2-^ ( KZ1+VC2+WZ3 
W1X- ( SIX ( ii , j j ) ) *INDGL( I ) 
Wly-(Sly(ii, j j))*IKDGL(I) 
Wlz-(Slz(ii, jj))*IXDGL(I) 

specification cf zhe r.cn in t era crier* z fee sires 

W2X-stlx< sidl )*INDGL( I ) 
W2y-stly ( sidl ) * IKDGL( I ) 
W2z~srlz ( sidl ) * INDGL( I ) 



W3 X-stlx ( sid2 ) * INDGL( I ) 
W3 v-s tly ( sid2 ) * IKDGL ( I ) 
W3z-stlz ( sid2 ) * IND3L( I ) 



W4X-stlx ( sid3 ) * INDGL( I ) 
W4y-stly ( sid3 ) * INDGL{ I ) 
W4z-stlz ( sid3 ) *IKDGL( I ) 



IHX«(WY1*WZ2-WY2*WZ1) *IND3L{I) 
IHY- ( WX2*WZ1-WZ2*WX1 ) * INDGL( I ) 
IHZ- ( WX1*WY2-WY1*WX2 ) * INDGL( I ) 
IH1-ISIGN(1, ( IHX*W1X+IHY*W1Y-IKZ*W1Z) J 

WRITE (6,8003) I , WXi , KYI , KZ1 , KX2 , WY2 ,KZ2 , 
* W1X , W1Y , K1Z , W2X , W2Y , W2Z , V3X , W3Y , W3 Z , IH1 , ic j , r 3 
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ENDDO 

CENTRE 0? GRAVITY COORDINATES 

AS X* FLOAT ( SX ) /LENT 
AS Y- FLOAT ( SY ) /LENF 
ASZ-FLOAT ( SZ ) /LENF 
XSHIFT-MID-ASX 
YSHIFT-MID-ASY 
ZSHIFT-MID-ASZ 
IF ( { XSHIFT* * 2+YSHIFT* * 2+ZSHIFT* * 2) .GT. 30) THEN 

xr k \ Asnir j. NORMALISATION 
CAIl* REMOVE (XYZ, INDGL(l) , X( 1) ,Y( 1 ) , Z< 1 ) , 13 , 13 , 13 ) 
CALL REMOVE(XYZ / INDGL(LENF) / X(LENT),Y(LENF) / Z(LENF) / 13,13 / 13) 

DO I-2,LENF1 

II-ICA(I-l) 

JJ-ICA(I) 

SID1-SIDGR1(II,JJ) 
SID2-SIDGR2(II,JJ) 
SID3-SIDGR3(II,JJ) 

CALL REMOVE(XYZ, INDGL( I) ,X(I) , Y( I ) , Z( I ) , SID1 , SID2 , SID3 ) 
ENDDO 

DO I- 1, LENF 

X(I)-X(I)+XSKIFT 

Y(I)-Y(I)+YSHIFT 

Z(I)«Z(I)+ZSHIFT 

ENDDO 

sxd- sxd-xs hi f t 
sy d« sy d-y shi f t 
szd-szd-zshift 

CALL setin(XYZ, INDGL(l) ,X(1) ,Y(1) ,Z(1) ,13/13, 13,1) 

CALL setin ( XYZ , INDGL ( LENF ) , X ( LENF ) , Y ( LENF ) , Z ( LENr ) , 13 , 13 , 13 , 1 ) 

DO I«2,LENF1 

II-ICA(I-l) 

JJ-ICA(I) 

SIDl-SIDGRl(II,uJ) 
SID2«SIDGR2(II,JJ) ' 
SID3-SIDGR3(II, JJ) 

CALL setin ( XYZ , INDGL ( I ) , X ; I ) , Y ; I } , Z ; I ) , SIZ1 ,SZZ2, SX~3 , 1 ) 

ENDDO 

END IF 



7700 CONTINUE 



write ( 13 , * ) iterm , energ , sxd , syd , szd 

do i-l/lenf 

xt(i)«x(i)+sxd 

yt(i)-y(i)+syd 
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ENDDO 



8003 
8002 

8010 
8000 



8005 



8006 



FORMAT(1X,//,5X,5F10.6) ' 

write(6,8010) 

format(lx,/,5x,50(lh-) ,/) 

FORKAT(3X,5I4,I6) 

do i«6,18,2 

write (6, 8005) i,iflip( i , 1 ) , if iip'i, 2) , if lip; i , 3 ) , if lice i, 4 ) , 

if lip (1,5) 

enddo 

format (lx,i5, 5i8) 

testl of occupancy - a cirecz one 

lenfo7-lenf*7~ 

Ienf23«lenf2-NBGL 

«*rite(6,8006) lenf, ler.fc?, ier.f23,r.bc 



format(lx,A5x, 'lenf 
isi-o 
ione-0 
ioc-0 

do xx» 1, max 
do yy-l,max 
do zz«l,jnax 
point-xyz ( xx , yy , zz ) 
if (point. ne.0) then 



/*ie: 



nbgl ncbl ' ,4i6,/; 



if (point. gt.0) then 



endif 



isi-isi-1 
else 

if (point. eq.-l) icne- 
ioc»icc-l 

endif 



Lone— j. 



enddo 



enado 
enddo 

c writing of the backbone and side croups coordinate 6 

c central (inert) bead 

c 

c do i«2,lenf2 

c ii-ica(i-l) 
c iii-ica(i) 

c SXl-stlLX ( SIDGR1 ( 1 1 , III ) )+X( I } 

c SYl«stlLY( SIDGR1 ( II , III ) )+Y ( I ) 

c SZl-stlLZ ( SIDGR1 (II, III) )+Z(I> 

c SX2-StlLX ( SIDGR2 (II , III ) )+X( I ) 

c SY2-stlLY ( SIDGR2 ( II , III ) )-Y(I ) 

c SZ2-stlLZ(£IDGR2(II / IIl ))+Z(I) 
c SX3«StlLX(SIDGR3(II,IH) )+X(I) 

c SY3-stlLY( SIDGR3 <II,III))*Y (I) 



- without the 
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c SZ3-stlLZ(SIDGR3(i:,III))«*-Z(I) 

c write<6,420) x(i),y<i) ,z(i) , sxl,syl, szl, sx2, sy2,sz2, 

c * sx3,sy3,sz3 

c 420 format(5x,4(2x,3i4)) 

c enddo 

ioc-ioc-3* ( lenf 2-nbgl ) 



ione-(ione-i4 )/3 -2 
c there are 3 -1 per side chain that is not a glycine 

c let us count the number of excess minus ones 

vrite(6,8007) ione, iocisi 
SCOT format ( Ix / // , IX , ' L-GLY / occupancy and side groups ,2i6,//, 
- 5x, ' ************ contact map ****»**.*****',/) 

do 400 i~2,lenf2 

ii- ica(i-l) 

iii- ica(i) 

c s4-sidgr4(ii,iii) 

c tx-tix(s4) 

c ty-tly(s4) 

c tz-tlz<s4) 

c ihl-ihanl(ii,iii) 

c ih2-ihan2<ii,iii) 

c ih3-ihan3(ii,iii) 

SXl-slx(ii,iii)+x<i) 
Sy 1-s ly ( ii , iii ) +y ( i ) 
Szl-slz(ii,iii)+z(i; 



DO 400 J«X,LENF1 

c it is not counting the nearestl down-the-chain neighbours, which nay be 

c usefull for some purposes 

" if (iabs(i-j ) .eq.l) go to 400 

R2-(X(J)-X(I) )**2+(Y(J)-Y(I))»*2- t (S(J)-ZC);'^2 
IF(R2.GT.ie) GO TO 400 

II- ICA(J-l) 

III- ICA(J) 

Kxi-six(i:,ni)+x(j) 

KY1«S1Y{II,III)+Y(J) 
KZ1-S1Z(II,III)+Z(J) 

Rll- ( SX1-KX1 ) ** 2* ( SY1-KY1 ) * * 2+ ( SZ1-KZ 1 ) * * 2 
mult-0 

IF(R11.EQ.2) KULT-MULT+1 

IF (MULT. ED - 0 ) GO TO 400 
I?(am(i,j) .gt. 0) THEN 
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WrITE<6,410) I,J,am(i,j) 
elseIF(am(i, j).lt.0) THEK 

WRITE(6,411) I,J,ani(i,j) 

else 

WRITE(6,412) I, J,am(i,j) 
ENDIF 

400 CONTINUE 

410 FORMAT(5X,I3, 1 and', 13,' residues are repulsive Mfl0.4) 

411 FORMAT { 3X, ' ** ' ,13 , 1 and',13, 1 are attractive Mfl0.4) 
412 fonnat(5x,i3, 'and' ,13, 'are inert \lfl0.4) 

CLOSE ( UNIT- 1) 
CLOSE ( UNIT- 2 ) 



CLOSE (UNI T« 5 ) 
CLOSE (UN IT- 6 ) 
CLOSE (UNIT«10) 

StOP 

END 
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function vaxran(iseed) 

equivalence (iyfl/yf!2) 

data mask,mask2/x ' 3 f 000000 • , x ' 3f 800000 ' / 

iseed-iseed*69069 + 1 

nseed-rshif t ( iseed , 8 ) 

if(iseed.lt.O) then 

iyfl- mask2+nseed 

vaxran-yfl2 

else 

iyfl-mask+nseed 

vaxran-yf 12- . 5 
endif 
return 
end 
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C SIDE *3 ...GLYCINE 

C REMOVES THE CLUSTER (RESIDUE + SIDE GROUP) 

SUBROUTINE REMOVE (XYZ, INDGL,JX, JY, JZ , ID1, ID2 , ID3 ) 
INTEGER XYZ(150,150,150),STLX(13),STLY(13),STLZ<13) 

C FCC LATTICE VECTORS (AND 000) 

DATA STLX /4*0, -1,1,-1, 1,-1,1,-1,1/ 0/ 
DATA STLY /-I, 1,-1, 1,1,-1,4*0,-1,1,0/ 
DATA STLZ /l, -1,-1,1, 2*0,1,-1,-1,1, 3*0/ 

C 

IF(INDGL.EO.O) GO TO 88 
IX-JX+STLX(ID1) 
IY«JY+STLY(ID1) 
IZ-JZ+STLZ(ID1) 
XYZ(IX,IY,IZ)-0 
IIX-JX+STLX(ID2) 
IIY-JY+STLY(ID2) 
IIZ-JZ+STLZ<ID2) 
XYZ(IIX,IIY,IIZ)«0 
IIIX-JX+STLX(ID3) 
IIIY«JY+STLY(ID3) 
IIIZ«JZ+STLZ(ID3) 
XYZ ( IIIX, IIIY , IIIZ ) -0 

LX-(IX+IIX+IIXX-JX)/2 

LY-(IY+IIY+IIIY-JY)/2 

LZ-{IZ+IIZ+IIIZ-JZ)/2 

XYZ(LX,LY,LZ)-0 
88 XYZ(JX,JY,JZ)-0 
IXL-JX-1 
XYZ(IXL,JY,JZ)-0 
IXP-JX-^1 
XYZ(IXP,JY,JZ)-0 
IYL-JY-1 

XYZ(JX,IYL,JZ)«0 

IYP-JY+1 

XYZ(JX,IY?,JZ)-0 

IZL-JZ-1 

XYZ(JX,JY,IZL)«0 

IZP-JZ+1 

XYZ(JX,JY,IZP)-0 

RETURN 

END 
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_ SIDE *3 ..glycine V~Aii 

C tPUTe" cTTOonfrT'TNE SETS -INDEX TO THE EXCLUDED VOLUME ENvciOPE 

C SJ S t5252? LENF1 AT THE SIDE GROUP POSITION. 30TH THE 

I T^[S 2 ARE-C0dS F -1 Ii? IS USED IN ENERGY CALCULATIONS) 



% ONLY THE FCC LATTICE VECTORS ARE ALLOWED TO INTERACT 

I SSSd2"ShSi or' handedness so that all interacting points 

C ^UBROUTINE^SETIN (XYZ , INDGL. JX, JY, 31, ID1, ID2 , 1D3 . IND) 

iffiSat (lS , 150,150) ,STLX( 13) , STLY ( 13 ) , STLI ( 13 ) . .1 

r FCC LATTICE VECTORS (AND 000) 

DATA STLX /4*0,-l, 1 , -1, 1,-1. 1.-1/ }> °/ 

DATA STLZ /l, -1,-1, 1, 2«0, 1,-1,-1, *. 3*0/ 
if (indgl.eq.O) go to 88 



IX«JX+STLX(ID1) 

IY-JY+STLY(ID1) 

IZ-JZ+STLZ(ID1) 

XYZ(IX,IY,IZ) — 1 

IIX«JX+STLX(ID2) 

IIY- JY+STLY ( ID2 ) 

IIZ-JZ+STLZ(ID2) 

XYZ(IIX,IIY,IIZ)— 1 

IIIX-JX+STLX(ID3) 

IIIY«JY+STLY(ID3) 

IIIZ-JZ+STLZ(ID3) 

XYZ<IIIX,IIIY,IIIZ)- : 1 

LX- ( IX+I IX+I I IX- JX ) /2 
LY- ( IY+IIY+IIIY-JY ) /2 
LZ-(IZ+IIZ+IIIZ-JZ)/2 
XYZ(LX,LY,LZ)-ind 
88 XYZ(JX,JY, JZ)— IND 

IXWX-1 

XYZ ( IXL , JY / JZ ) —IND 
IXP-JX+1 

XYZ ( IXP , JY , JZ ) — IND 
IYL-JY-1 

XYZ(JX,IYL,JZ>— IND 
IYP-JY+1 

XYZ(JX,IYP,JZ)— IND 
IZL-JZ-1 

XYZ(JX,JY,IZL)— IND 
IZP-JZ+1 

XYZ<JX,JY,IZP)— IND 

RETURN 

END 
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SIDE *3 . . .clycine 

CHECK OF OCCUPANCY - ENTIRE CLUSTER ( RESIDUE+SIDE GROUP) 
FUNCTION LOOK(XYZ,INDGL,JX,JY,JZ,IDl, ID2,ID3) 
LOGICAL LOOK 

INTEGER XYZ(150,150,150),STIJC(13),STLY(13),STLZ(13) 
FCC LATTICE VECTORS {AND 000) 
DATA STLX /4*0,-l, 1,-1, 1,-1, 1,-1, 1, 0/ 
DATA STLY /-l, 1,-1,1, 1,-1/4*0, -1, 1, 0/ 
DATA STLZ /l, -1,-1,1, 2*0,1,-1,-1,1, 3*0/ 
LOOK-. FALSE. 

IF(INDGL.EQ.O) GO TO 88 
IX«JX+STLX(ID1) 
IY«JY+STLY(ID1) 
IZ-JZ+STLZ(ID1) 
IF(XYZ(IX,IY,IZ).NE.O) THEN" 

LOOK-. TRUE. 

RETURN 

ENDIF 
IIX-JX+STLX(ID2) 
IIY-JY+STLY(ID2) 
IIZ-JZ+STLZ(ID2) 
IF(XYZ(IIX,IIY,IIZ).N'E.O) THEN 

LOOK- . TRUE . 

RETURN 

ENDIF 
IIIX-JX+STLX(ID3) 
IIIY- JY+STLY ( ID3 ) 
IIIZ-JZ+STLZ(ID3) 

IF(XYZ(IIIX,IIIY,IIIZ).KE.C) THEN 
LOOK-. TRUE. 
RETURN 
ENDIF 

LX-( IX+IIX+IIIX-JX)/2 

LY- ( IY+IIY+IIIY-JY>/2 

LZ- ( IZ+IIZ+IIIZ-JZ ) /2 

IF(XYZ(LX,LY,LZ).NE.O) THEN- 
LOOK-. TRUE. 
RETURN 
ENDIF 

C 

88 IF(XYZ(JX,JY, JZ).NE.O) THEN 

LOOK-. TRUE. 
RETURN 
ENDIF 

IXL-JX-1 

IF(XYZ(IXL,JY,JZ).NE.O) THEN 
LOOK-. TRUE. 
RETURN 
ENDIF 

IXP-JX+1 

IF(XYZ(IXP,JY,JZ).NE.O) THEN 
LOOK-. TRUE. 
RETURN 
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EKDIF 



IYL-JY-1 

IF{XYZ(JX,IYL,J2).NE.O) THEN 
LOOK-. TRUE. 
RETURN 
ENDIF 

IYP-JY+1 

IF(XYZ(JX,IYP,JZ).NE.O) THEN 
LOOK-. TRUE. 
RETURN 
ENDIF 

IZL-JZ-1 

IF(XVZ(JX,JY,IZL).NE.O) THEN 
LOOK- . TRUE . 
RETURN 
ENDIF 

IZP-JZ+1 

IF(XYZ( JX, JY, IZP) . NE. 0) LOOK- . TRUE. 

RETURN 

END 
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C side 3 glycine 

C THIS FUNCTION" COMPUTES THE STRENGTH OF INTERACTIONS BE"TvE£N TH r 

C SIDE GROUPS - only the nearest neighbours r-1, for aklG2gly.f 

c all interactions are at a distance 2 

c 

c program setind.f 5/12/89 

FUNCTION ERG ( XYZ , INDGL , AM, KSX ,KSY, KSZ,J) 
DIMENSION AM(150,150) 

INTEGER XYZ(150,150,150),Pl,P2,P3,P4,P5,P6,p7,p8,D9 
integer pl0,pll,pl2 

ERG-0 . 

IF ( INDGL. EQ.O) RETURN 

IX-KSX-1 
JX-KSX+1 
IY-KSY-1 
JY-KSY+1 
IZ-KSZ-1 
JZ-KSZ+1 

C 

c vectors in the z plane 

Pl-XYZ(JX,jy,KSZ) 
IF(Pl.GT.O) ERG-ERG+AK ( PI , J ) 
P2«XYZ(jX,iY,KSZ) 
IF(P2.GT.O) ERG-ERG-r AM ( P 2 , J ) 
P3«XYZ(iX,jy,KSZ) 
IF(P3,GT.O) ERG-ERG+AM(P3,J) 
p4-XYZ(iX,iY,KSZ) 
IF(p4.GT.O) ERG«ERG+AK(p4,J) 

c vectors in the x plane 

p5«XYZ{KSX,JY, jZ) 
IF(pS.GT.O) ERG - ERG + AM ( p5 , J" ) 
p6-XYZ(KSX,IY,jZ) 
IF(p6.GT.O) ERG-ERG+AM(d6,J) 
p7«XYZ(KSX,JY,iZ) 
IF(p7.GT.O) ERG-ERG + AM ( p7 , J ) 
p8-XYZ(KSX,IY,iz) 
IF(p8.GT.O) ERG-ERG+AM(p8,J) 

C VECTORS IN THE Y PLANE 

P9-XYZ(IX,KSY,JZ) 
IF(P9.GT.O) ERG-ERG + AM ( P 9 , J ) 
P10-XYZ(IX,KSY,IZ) 
IF(PIO.GT.O) ERG-ERG+AM(P10,J) 
P11-XYZ(JX,KSY,JZ) 
IF(Pll.GT.O) ERG-ERG+AM ( P1I , J ) 
P12-XYZ(JX,KSY,IZ) 
IF(P12.GT.O) ERG-ERG+AK(P12,J) 
RETURN 
END 
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C repulsion only to r2=5 

C 

C THIS FUNCTION COMPUTES THE STRENGTH OF REPULSIVE INTERACTIONS 

C FUNCTION EREPUL(XYZ,X,Y,Z,AREP> 

INTEGER XYZ(150,150,150),X,Y,Z 
DATA LO /-I/ 
I°0 

IX°X-1 
JX°X+1 
IY°Y-1 
JY°Y+1 
IZ«Z~1 
JZ-Z+l 

c fee lattice 

IF ( XYZ ( IX , IY , Z ) . LT . LO ) I°I*1 
IF(XYZ(IX,JY,Z).LT.LG) 1=1+1 
IF(XYZ(JX,IY,Z) .LT.LO) I-I+l 
IF(XYZ(JX,JY,Z). LT.LO) 1=1+1 

IF(XYZ(X, IY, IZ) .LT.LO ) 1=1+1 

IF(XYZ(X,IY,JZ). LT.LO) I~I+1 

IF ( XYZ ( X / JY , IZ ) . LT . LO ) 1=1+1 

IF(XYZ(X,JY,JZ). LT.LO) 1=1+1 
IF(XYZ(IX,Y, IZ) .LT.LO) 1=1+1 
IF(XYZ ( IX, Y, JZ) .LT.LO) 1=1+1 
IF(XYZ( JX/Y/IZ) .LT.LO) 1=1+1 
IF(XYZ( JX/Y f JZ) .LT.LO) 1=1+1 
EREPUL= I * AREP 
RETURN 
END 
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C HYDROGEN BONDING AND " COOPERATIVTTY " (BETA AND ALPHA MOTIFFS) 

FUNCTION EHB ( XYZ , ICA , PRODV , IX, IY, IZ, ID,AHB) 
INTEGER XYZ(150,150,150),ICA(0:150),PRODV(24,24) 
DATA LO /-l/ 
1-0 

IXL-IX-3 

IXP-IX+3 

IYL-IY-3 

IYP-IY+3 

IZL-IZ-3 

IZP-IZ+3 

ICl-ICA(ID-l) 

IC2-ICA(ID) 

IF(XYZ(IXL,IY,IZ).LT.LO) THEN 
IDD— XYZ(IXL,IY,IZ) 
INl-ICA(IDD-l) 
IN2-ICA(IDD) 

I«I+PRODV( IC1 , INI ) +PRODV( IC1 , IN2 )+PRODV( IC2 , INI ) -PRODV( IC2 , IN2 ) 
ENDIF 



IF(XYZ(IXP,IY,IZ).LT.LO) THEN 
IDD— XYZ(IXP,IY,IZ) 
INl-ICA(IDD-l) 
IN2-ICA{IDD) 

I-I+PRODV( IC1 , INI ) +PRODV( IC1 , IN2 ) +PRODV( IC2 , INI ) -^PRCDV( IC2 , IN2 ) 
ENDIF 

IF(XYZ(IX,IYL,IZ).LT.LO) THEN 
IDD— XYZ(IX,IYL,IZ) 
INl-ICA(IDD-l) 
IN2-ICA(IDD) 

I-I+PRODV( IC1, IN1)+PR0DV( IC1, IN2 ) + PRODV ( IC2, INI )- i -PRCDV( IC2 , IN2 ) 
ENDIF 

IF(XYZ(IX,IYP,IZ).LT.LO) THEN 
IDD— XYZ(IX/ IYP, IZ) 
INl-ICA(IDD-l) 
IN2-ICA(IDD) 

I-I+PRODV( IC1 , INI )+PRODV< IC1 , IN2 ) +PRODV( IC2 , INI ) *PRCDV( IC2 , IN2 ) 
ENDIF 

IF(XYZ(IX,IY,IZL).LT.LO) THEN 
IDD— XYZ ( IX , IY , IZL ) 
INl-ICA(IDD-l) 
IN2-ICA(IDD) 

I-I+PRODV( IC1 , IN1)+PR0DV( IC1 , IN2 )+PRODV( IC2 , INI) •'-PRODV ( IC2 , IN2 ) 
ENDIF 

IF(XYZ(IX,IY,IZP).LT.LO) THEN 
IDD— XYZ(IX,IY,IZP) 
INl-ICA(IDD-l) 
IN2-ICA<IDD) 

I-I+PRODV( IC1 , INI )+PRODV( ICi , IN2 )+PRODV( IC2 , IN1)+PR0DV( IC2 , IN2 ) 
ENDIF 
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APPENDIX E 



SAMPLE INPUT 



.25 0.25 0.25 0.34 0.25 0.34 0.25 - Weights for states 

6,8,10,12,14,16,18 



Repulsive potential 
weight. 

Dihedral (tortional/ 
rotational) angle 
weight. 

Hydrogen bond (bond 
angle) parameter. 




0.35 



Size of kinJc jump 



States 

Residue ,6 8 10 12 14 16 18 Hvdrophobicitv 



2 
3 
4 



111 
111 
111 



0 
0 
0 



1 
1 
1 



1 
1 
1 



1 
1 
1 



-1 
-1 
"1 



Bond angle 
preferences for 
various states. 



10 110 111-1 

11 111101 1 



111 



Glycine 



111 



n 



2 16 37 1 1 Dihedral 

I (tortional/ 

3 16 37 1 1 1 rotational) 

J angles for 
sequence . 
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Residue 6 8 3,0 12 14 
10 10 21 1.5 1 



31 14 11 1.5 -1 
43 14 29 1.5 -1 



13. IS. Hvdrophbbicity 



PHIL-PHOB, PHIL-PHIL, PHOB-PHOB 1.000 0.250 -0.750 

REPULSIVE INT. AND COOPER. +H-BOND 6.000 -0.150 
SCALING FACTOR FOR DIHEDRAL ANGLE POTENTIAL -0.600 



APPENDIX E 



SAMPLE T ERTIARY INTERACTION TABLE 
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SAMPLE TERTIARY INTERACTION TABLE 



1 = cys 6 = val 11 = thr 16 = asp 

2 = met 7 = tryp 12 = ser 17 = his 

3 = phe 8 = tyr 13 = gin (glutamine) 18 = arg 

4 = ile 9 = ala 14 = asn 19 = lys 

5 = len 10 = gly 15 = glu (glutonic acid) 20 = pro 



cys interactions 

ahyd(l,l)»-5.44 
ahydd/2)— 5.05 
ahyd( 1,3)— 5.63 
ahyd(l / 4)«-5.03 
ahydd/5)— 5.03 
ahyd( 1/6)— 4.46 
ahvdd/7)— 4.76 
ahvd(l / 8)— 3. 89 
ahycd/9)— 3.38 
ahyd (1,10)— 3.16 
ahydd,!!)— 2.88 
ahyd(l / 12)— 2.86 
ahyd(l, 13)— 2.73 
ahyd(l, 14 )— 2.59 
ahydd/ 15)— 2.08 
ahydd ,16)— 2.66 
ahyd(l, 17)— 3.63 
ahydd, 18)— 2.70 
ahydd, 19)— 1.54 
ahyd(l, 20)— 2.92 

met interactions 
ahyd(2,2)— 6.06 
ahyd(2,3)— 6. 68 
ahyd(2,4)— 6.33 
ahyc(2,5)— 6.01 
ahyd(2,6) — 5.52 
ahyc(2 / 7)— 6.37 
ahyd(2,8)— 4.92 
ahyd(2,9)— 3.99 
ahyd(2,10)— 3.75 
ahyd(2 / ll)— 3.73 
ahyd(2,12)— 3.55 
ahyd(2,13)— 3.17 
ahyd(2,14)— 3.50 
ahyd(2,15)— 3.19 
ahyd(2,16)— 2.90 
ahyd(2,17)— 3.31 
ahyd(2,18)— 3.49 
ahyd(2,19)— 3.11 
ahyd(2,20)— 4 .11 



phe interactions 



ahyd 


(3,3)- 


6.es 


ahyd 


(3,4)- 


•6.39 


ahyd 


(3,5) — 


6.26 


ahyd 


(3,6) — 


5.75 


ahyd 


(3,7) — 


6.02 


ahyd 


(3,8) — 


4.95 


ahyd 


[3,9) — 


4.36 


ahyd 


[3,10)- 


-3.72 


ahydi 


[3,11)- 


-3.76 


ahyd 


[3,12)- 


-3.56 


ahyd 


[3,13)- 


-3.30 


ahyd 


[3,14)- 


-3.55 


ahyd 


[3,15)- 


-3.51 


ahyd 


r 3,16)- 


-3.31 


ahyd 


3,17)- 


-4.61 


ahyd 


'3,18)- 


-3.54 


ahydi 


3,19)- 


-2.83 


ahyd 


3,20)- 


-3.73 




ile interactions 


ahyd< 


4,4)-- 


6.22 


ahydi 


4,5)— 


6.17 


ahydi 


4,6) — 


5. 58 


ahydi 


4,7) — 


5.64 


ahydi 


4,8)— 


4.63 


ahydi 


4,9)— 


4.41 


ahydi 


4,10)- 


-3.65 


ahydi 


4,11)- 


-3.74 


ahydi 


4,12)- 


-3.43 


ahydi 


4,13)- 


-3.22 


ahydi 


4,14)- 


-2.99 


ahyd( 


4,15)- 


-3.23 


ahyd( 


4,16)- 


-2.91 


ahyd( 


4,17)- 


-3.76 


ahyd( 


4,18)- 


-3.33 


ahyd( 


4,19)- 


-2.70 


ahyd ( 


4,20)- 


-3.47 



leu 

ahyd(5,5)— 5.79 
ahyd(5,6)— 5.38 
ahyd(5,7)— 5. 50 
ahvd(5,8)— 4.26 
ahyd(5,9)— 3. 96 
ahyd(5,10)— 3.43 
ahyd(5,ll)— 3.43 
ahyd(5,12)— 3.16 
ahyd (5, 13)— 3.09 

ahyd(5,14)— 2.99 
ahyd(5,15)— 2.91 
ahyd(5,16)— 2.59 
ahyd (5, 17)— 3.84 
ahyd(5,18)— 3.15 
ahyd(5,19)— 2.63 
ahyd(5,20)— 3 . 06 



valine 



ahyd 


(6,6) — 


4 . 


94 


ahyd 


(6,7) — 


5. 


05 


ahyd 


(6,8) — 


4 m 


05 


ahyd 


(6,9) — 


S! 


62 


ahyd 


[6,10)- 


-3 


.06 


ahyd 


(6,11)- 


-2 


.95 


ahyd 


[6,12)- 


-2 


.79 


ahyd 


[6, 13)= 


-2 


.67 


ahyd i 


[6,14)- 


-2 


.36 


ahyd | 


[6,15)- 


-2 


.56 


ahydi 


[6,16)- 


-2 


.25 


ahydi 


[6,17)= 


-3 


.38 


ahyd 


6,16)' 


-2 


.7B 


ahyd 


'6,19)- 


_2 


.95 


. ahyd 


6,20)- 


-2 


.96 


try? 








ahyd < 


7,7) — 


5. 


42 


ahyd ( 


7,8) — 


4 . 


44 


ahyd ( 


7,9) — 


3 . 


93 


ahyd i 


7,10)- 




.37 


ahydi 


7,11)- 


-3 


.31 


ahydi 


7,12)- 


-2 


.95 


ahydi 


7,13)- 


-3 


.16 


ahydi 


7,14)- 


-3 


.11 


ahyd < 


7,15)- 


-2 


.94 


ahydi 


7,16)- 


-2 


.91 


ahydi 


7,17)- 


-4 


.02 


ahyd i 


7,18)- 


-3 


.56 


ahvd ( 


7,19)- 


-2 


.49 


ahyd(7,20)- 


-3 


!66 
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Sample Tertiary Interaction Table (Cont.) 



tyr 

ahyd(8,8)— 3.55 
ahvd(8,9)— 2.85 
ahyd(8,10)— 2.50 
ahyd(8,lD— 2.48 
ahyd(8,12)— 2.30 
ahvd(8,13)— 2.53 
ahyd{8,14)— 2.47 
ahyd(8,15)— 2.42 

ahyd(8,16)— 2.25 
ahvd(8,17)— 3.33 
ahyd(8,18)— 2.75 
ahvd(8,19)— 2.01 
ahyd(8,20)«- 2.80 

-r-r-t"H-+-r-r-t"r- 

ala 

ahvd(9,9)«- 
ahyd(9,10)' 
ahvd(9,ll)' 
ahyd(9,12)' 
ahyd(9,13) 
ahyd(9,14) 



•2.51 

-2.15 

-2.15 

—1.89 

-1.70 

—1.44 



ahyd(9,15)- 
ahyd(9,16)~ 
ahyd(9,17)— 
ahyd(9,18)=- 
ahyd(9,19)=- 
ahyd(9,20)~ 

ahyd(10,10) 
ahyd(10,ll) 
ahyd(10,12) 
ahyd(10,13) 
ahyd(10,14) 
ahyd(10,15) 
ahyd(10,16) 
ahyd(10,17) 
ahyd(10,18) 
ahyd(10,19) 
.ahyd(10,20) 



1.51 
1.57 
2.09 
1.50 
1.10 
1.81 

=-2.17 ' 
—2.03 
—1.70 
—1.54 
—1.56 
—1.22 
—1.62 
—1.94 
—1.68 
—0.84 
—1.72 

H"r-r-r-r+ 



thr 
ahyd(ll,ll)« 
ahyd( 11,12 )■ 
ahyd( 11,13 )■ 
ahyd( 11,14 )■ 
ahyd( 11,15 )■ 
ahyd(ll,16)' 
ahyd( 11,17 ) • 
ahvd( 11,18 )■ 
ahyd(ll,19)> 
ahyd( 11,20 )< 
+++++++++++• 



-1.72 
-1.59 
-1.59 
-1.51 
-1.45 
-1.66 
-2.31 
-1.97 
-1.02 
-1.66 



serine 

ahyd(12,12)— 1.48 
ahyd(12,13)— 1.37 
ahyd(12,14)— 1.31 
ahyd(12,15)— 1.48 
ahyd(12,16)— 1.46 
ahyd(12,17)— 1.94 
ahyd(12,i8)— 1.22 
ahvd(12,19)— 0.83 
ahy d( 12, 20)— 1.35 

T-r-r-r-r-r-i-r ": i t- H -tt' H - 

olutamine 
ahvd (13,13 ) - 
ahyd( 13,14 ) ■ 
ahyd(13,15)- 
ahyd(13,16)« 
ahvd(13,17)> 
ahvd( 13,18 )■ 
ahvd( 13,19 )■ 
ahyd( 13,20 )■ 



-0.89 
-1.36 
-1.33 
-1.26. 
-1.851 
-1.85 
-1.02 
-1.73 



glu 

ahyd(15,15)— 1.18 
ahvd (15 ,16 )—1.23 
ahyd(15,17)— 2.27 
ahyd(15,18)— 2.07 
ahvd(15,19)— 1.60 
ahvd(15,20)— 1.40 



aso 

ahvd(16,16)— 0.96 
ahvd(16,17)— 2.14 
ahyd(16,lB)— 1.98 
ahyd(16,19)— 1.32 
ahyd(16,20)— 1.19 

his 

ahyd(17,17)— 2.78 
ahyd(17,18)— 2.12 
ahyd(17,19)— 1.09 
ahyd(17,20)— 2.17 

arg 

ahyd(18,18)— 1.39 
ahyd(18,19)— 0.06 
ahyd(18,20)— 1.85 

lys 

ahyd(19,19)-0.13 
ahyd(19,20)«- 0.67 

pro 

ahyd(20,20)— 1.18 



asn 

ahyd(14,14)— 1.59 
ahyd(14,15)— 1.43 
ahyd(14,16>— 1.33 

ahyd(14,17)— 2.01 
ahyd(14,18)— 1.41 
ahvd(14,19)— 0.91 
ahyd(14,20)— 1.43 

r--i---r-r i ■ ! I i ! i tt tt tT 
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SAMPLE OUTPUT 



The native contact pairs are: 
2 18 
25 53 
57 163 

Snapshot (interim report) every 5000 Monte Carlo timesteps, 
TEMPERATURE OF THE SYSTEM = 0.34 0 



Square of distance 
between adjacent 



Radius of gyration 





if 






ltem- 


R2- 


AS 2- ENERGY- 




225 


50:23 


-302.2'b51 


2 


203 


48.55 


-294.1492 


3 


257 


49.63 


-298.5013 




227 


49.07 


-301.3834 


5 


275 


50.14 


-306.7366 


6 


299 


50.46 


-304.1194 


7 


221 


49. B5 


-303.1781 


8 


331 


48.67 


-294.3552 


9 


329 


47.53 


-294.4433 


10 


257 


49.45 


-291.8564 


2.1 


297 


49.12 


-299.2383 


12 


299 


49.15 


-299.5342 


13 


297 


48.81 


-298.7695 


14 


261 


48.46 


-294.4760 


15 


297 


50.30 


-300.5060 


16 


201 


48.37 


-292.5648 


17 


269 


48.30 


-293.5644 


18 


269 


49.49 


-298.5347 


19 


275 


50.13 


-305.4173 


20 


237 


50.13 


-285.0356 


21 


237 


46.07 


-288.0940 


22 


363 


49.26 


-298.0356 


23 


227 


49.21 


-293.2128 


24 


241 


50.60 


-297.5958 



f 



native 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



Number of contacts 



any contacts 

30 

30 

30 

30 

30 

30 

30 

28 

28 

27 

29 

29 

29 

29 

30 

28 

28 

29 

30 

27 

27 

30 

29 

30 
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FINAL CONFORMATION 



PCT/US91/02786 



VECT0R1 VECT0R2 



SIDE 1/2/3 



KANDENZS 



2 
4 



3 

6 
7 
8 
9 
10 
11 

12 
13 
14" 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 



0 
. 1 
-2 



2 1 
0 2 
0 1 



10 2 
2-1 0 
2 0-1 
2 0 1 
2 10 
0-1 2 
-2 0 1 

0 -2 -1 
-2 10 
-2 -1 0 
-2 0 -1 



-2 
-1 
1 



0 -2 
0 1 
2 0 
2 0 



-1 0 -2 
0 



2 1 
0 2 

0- 12 
2 0 1 
0 -2 -1 
10 2 
0 12 

1- 2 0 
-1 -2 0 

10-2 
2 10 
10 2 
-1 -2 

0 -1 

1 0 
-1 2 



0 
0 
0 

2 -1 



0 
2 
2 
0 

2 1 

1 -2 

2 -1 
0 
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APPENDIX E 

SAMPLE OUTPUT (CO^T.) 
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APPENDIX E CCOCT.) 



SAMPLE OUTPUT (CONT.) 
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What is Claimed is: 

1. Method of determining by a machine a 
three-dimensional structure of a protein or portion 
thereof including sidechains, the method comprising 
the steps of: 

specifying a sequence of amino acid residues 
whose native tertiary structure is to be determined; 

specifying local conformation preferences 
for respective residues of the sequence, and 
representing tertiary interactions between all pairs 
of sidechains ; 

specifying a temperature; 

automatically generating a representation of 
an unfolded chain of the residues in three 
dimensions ; 

simulating in the machine folding of the 
chain and interactions between all pairs of 
sidechains, in accordance with said conformation 
preferences and said temperature, and producing a 
representation of a corresponding native tertiary 
structure ; and 

displaying the representation of the 
tertiary structure. 

2. The method of claim 1 where the step of 
displaying includes the step of presenting the 
tertiary structure as a three-dimensional 
representation . 

3. The method of claim 1 where the step of 
simulating includes the steps of stopping the 
simulating operation at an intermediate stage, 
specifying another temperature, and resuming the 
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simulating operation. 

4. Method of determining a three- 
dimensional conformation of a globular protein 
utilizing Monte Carlo dynamics technique with 
asymmetric Metropolis sampling criterion, the method 
comprising the steps of: 

specifying a sequence of amino acid residues 
of the protein; 

generating a three-dimensional 
representation of an unfolded conformation consisting 
of an a-carbon backbone and sidechains in response to 
the specified sequence; 

producing from the unfolded conformation, 
using said technique, successive likely conformations 
at a predetermined temperature according to the total 
energy of each conformation; 

selecting from the successive likely 
conformations the lowest total-free-energy tertiary 
conformation which satisfies said criterion; and 

determining the coordinates of the selected 
tertiary conformation for display. 

5. The method of claim 4 where the step of 
producing includes the step of determining local 
conformational energetic preferences of the a- 
carbons . 

6. The method of claim 5 where the step of 
producing includes the step of identifying spatially 
close pairs of sidechains in each conformation. 

7. The method of claim 6 where the step of 
producing includes the step of simulating tertiary 
interactions between said spatially close pairs. 



8. The method of claim 7 where the step of 
producing includes the step of determining the sum of 
the effective interaction contact energy between 
respective close pairs based on predetermined 
frequency of contact between said pairs. 

9. The method according to claim 8 where 
the step of determining the effective interaction 
contact energy includes the step of scaling said 
energy to a selected lowest level by referencing 
average interaction contact energies of non-polar 
residues to a hydrophobicity scale ♦ 

10. A computer-based model for 
representing, in a three-dimensional cartesian 
coordinate system, a conformation of a protein or 
portion thereof, including the protein's a-carbon 
backbone and sidechains of finite surface area, as 
the protein folds from an unfolded sequence of amino 
acid residues to a folded tertiary structure, the 
model comprising: 

a cubic arrangement of lattice sites 
disposed for framing the conformation; 

the cubic arrangement being represented by 
unit vectors (±1,0,0), (0,±1,0), (0,0, ±1), where the 
distance between adjacent lattice sites is unity; and 
where 

each a-carbon occupies a lattice site 
located at a distance of J5 units from its adjacent 
a-carbon along a (±2, ±1,0) vector or cyclic 
permutation thereof* 

11. The model of claim 10 where said cubic 
arrangement is a 24-nearest neighbor lattice. 
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12. The model of claim 11 where each a- 
carbon is represented as occupying a central cubic 
lattice site plus six adjacent cubic lattice sites 
defining a surface of interaction of finite size, and 

5 each sidechain is represented as being embedded in 

the lattice and occupying a selected number of 
lattice sites located relative to said central site, 
the number of sites occupied by the sidechain being 
proportional to the number of sites defining the 
10 surface of interaction. 

13. A computer-based system for determining 
a three-dimensional structure of a protein or portion 
thereof including sidechains, the system comprising: 

15 input means for specifying a sequence of 

amino acid residues whose native tertiary structure 
is to be determined, and for specifying a temperature 
and local conformation preferences for respective 
residues of the sequence; 

20 first memory means for storing the specified 

sequence, temperature, and conformation preferences; 

second memory means having a stored program 
with routines for performing Monte Carlo dynamics 
simulation with asymmetric Metropolis sampling 

25 criterion and for representing tertiary interactions 

between all pairs of the sidechains; 

processing means coupled to the input, and 
first and second memory means, and responsive to the 
specified sequence, temperature, and conformation 

30 preferences for, under control of the stored program, 

generating a first set of coordinates representing a 
conformation of an unfolded chain of the residues in 
three dimensions, determining a total free 
interaction energy from tertiary interactions between 

35 all pairs of the sidechains, simulating folding of 
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the chain at the specified temperature and in 
accordance with said conformation preferences and 
total interaction energy, and producing a second set 
of coordinates representing a native tertiary 
structure; and 

display means coupled to the processing 
means for displaying the second set of coordinates 
depicting the native tertiary structure in three 
dimensions. 

14. An apparatus for determining a three- 
dimensional structure of a selected protein including 
a plurality of a-carbons comprising: 

means for storing a representation of a 
selected sequence of amino acid residues of the 
protein and an initial starting temperature value; 

means for generating a representation of a 
cubic arrangement of lattice sites, including means 
for positioning adjacent sites a unit distance from 
one another and means for positioning a plurality of 
a-carbons on selected lattice sites, each a-carbon 
located a distance on the order of J5 from an 
adjacent a-carbon; 

means for combining said generated 
representation of said cubic arrangement with said 
representation of said selected stored sequence; 

means for producing, in response to said 
temperature and in accordance with said cubic 
arrangement, a representation of one or more folded, 
three-dimensional protein structures; and 

means for comparing said produced 
representation of three-dimensional protein structure 
to a predetermined criterion and for selecting one of 
said produced representation for display only in 
response to a predetermined comparison result. 
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15. An apparatus as in claim 14 including 
means for interrupting said producing, for storing a 
new temperature value and re-initiating said 
5 producing. 
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later than the priority date claimed 



"T" later document published after the international filing date 
or priority date and not m conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance: the claimed invention 
cannot be considered novel of cannot be considered to 
involve an inventive step 

M Y** document of particular relevance; the claimed invention 
cannot be considered to involve an inventive step when the 
document is combined with ons or more other such docu- 
ments, such combination being; obvious to a person skilled 
■n the art. 

"A" document member of the same patent family 



IV. CERTIFICATION 



Date of the Actual Completion of the International Search 3 

16 JULY 1991 



International Searching Authority t 



ISA/US 



Date of Mailing of this International Starch Report * 

! j 6 AUG £31 

Signature of Authorized Wear t J^^j > //^i Mf-etc 
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V (2 OBSERVATIONS WHERE CERTAIN CLAIMS WERE FOUND UNSEARCHABLE 1 

This international search reoort has not been established in resoect of certain claims under Article 17(2) (a) for the following 'easons: 
1 E3 Claim numoers l^l^because tney re i a te to suoiect matter* not reauired to be searched oy this Authority, namely: 

1) Claims 1-9 & 13-15 relate to "scientific & mathematical theories" 
(See PCT Rule 39.1(i)). 

2) Claims 10-12 relate to "computer programs" (See PCT Rule 39.1(vi)). 



2.n Claim numbers , because they relate to parts of the international ao plication that do not comely with the prescribed reauire- 

^ents to such an eitentthat no meaningful international search can be carried out ». soeofically: 



xQ Claim numbers , because they art) o>o«od*ntclaimt not drafted in accordance wim tie second and third sentences of 

PCT Rule 6.4(a). 

VI. □ OBSERVATIONS WHERE UNITY OF INVENTION IS LACKING* 

This International Searching Authority found multiple inventions in this international application 3% :c to as: 



1 - 0 As 1(1 r *4 uir *d additional search fees were timely paid by the applicant, this international search reoort cover* . ~r*r:i9i<r cid>ms 

ofjhe international application. 

2- 0 As only some of the required additional search fees were timet, paid by the applicant this international *- . - * » c-'r 

those claims of the international application for which fees were paid, specifically claims: 



3.["1 No required additional search fees were timely paid by the applicant Consequently, this international searcn 
the invention First mentioned in the claims: it is covered by claim numbers: 



rT„- s rr>:» ■ 



4 O As all searchable claims could be searched without effort justifying a n additional fee. the International Searching Au 
^ invite payment of any additional fee. 

Remark on Protest 

r") The additional search tees were accompanied by applicant's protest 

d No protest accompanied the payment of additional search lees. 
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